Ten Years as a Microsoft MVP

Ten years. I keep counting it on my fingers like it might come out different the next time. It does not.

In early 2017, Microsoft awarded me the Data Platform MVP for the first time. I had been speaking at community events for about four years at that point, starting with a shaky user group presentation in Copenhagen and gradually building up to SQLSaturdays across Europe. When the notification mail arrived I read it three times. Then I called my wife. Then I read it again.

What I did not anticipate was how much the platform itself would transform in the decade ahead, or how completely my own technical identity would change alongside it. This is a reflection on both.

2016: On-premises and proud of it

When the MVP journey began, my world was SQL Server. Specifically, Analysis Services Multidimensional. I wrote MDX for a living, tuned partition strategies, scripted deployments in PowerShell, and built dimension security models with AMO. The toolchain was mature, the community was tight, and the assumption was that your servers lived in a rack somewhere you could point to.

Azure existed, of course. It was about six years old. But for most of us in the BI world it still felt like someone else’s problem. Azure SQL Database was limited. Azure Analysis Services was in preview. The idea that a serious analytical workload could run entirely in the cloud was something you heard at conference keynotes and politely ignored on Monday morning.

I remember attending a PASS Summit session around that time where someone demoed Azure Data Lake by showing a slide deck of screenshots. With errors in them. That was the state of cloud analytics in our corner of the Microsoft ecosystem. Promising on paper. Not quite there in practice.

2017 to 2020: The cloud came whether we were ready or not

Each year I renewed the MVP award, Azure had added another service that pulled more of my on-premises workload toward the cloud. Azure Data Factory replaced chunks of what SSIS used to do. Azure Synapse started absorbing the data warehouse conversation. Managed identities appeared, and with them a new category of debugging I had never encountered before: authentication that worked in test and mysteriously failed in production with no clear error message.

The learning curve was not optional. I recertified Azure exams not for the credential itself but because the platform had changed enough that I needed to confirm I still understood what I was deploying. Every renewal cycle felt like a different job.

During this same stretch, Power BI went from “promising” to “the center of gravity for everything Microsoft BI.” It shipped in mid-2015, and within two years the conversation at every community event had shifted from “should we look at Power BI?” to “how do we govern hundreds of workspaces?” I took over running the Danish Power BI User Group and watched it grow past 2,000 members. The problems I solved daily moved from MDX tuning to DAX optimization, from partition scripting to deployment pipelines and semantic model management. Different skill set. Same instinct: figure it out, test it, share what happened.

2021 to 2023: The convergence

Then the pieces started merging. Azure Synapse tried to be the unified analytics service. Power BI Premium brought dataset hosting into the cloud. Microsoft kept drawing a tighter circle around what had been separate products, pushing toward something integrated.

And then Fabric arrived.

Announced at Build in May 2023, generally available by November that year. Within months it had absorbed the Power BI service, introduced Lakehouses, OneLake as a unified storage layer, Data Pipelines, and Notebooks as a first-class authoring experience. The pitch was simple: one platform for data engineering, data science, real-time analytics, and business intelligence. No more stitching five Azure services together and hoping the authentication tokens lined up.

2024 to now: A different job

The speed of Fabric’s evolution catches me off guard sometimes. I spent years becoming deeply skilled in SSAS Multidimensional. That knowledge still informs how I think about models, but it is no longer the center of my work. The architecture underneath shifted fundamentally. Lakehouses replaced cubes. Notebooks replaced XMLA scripts. Semantic models became the new version of what we used to call datasets, which themselves replaced what we used to call cubes.

Each rename is not just marketing. Each one reflects a genuine change in how the technology works and what you can build with it.

Today I write Python in Fabric Notebooks to snapshot and diff semantic model metadata. Materialized Lake Views went GA. Fabric IQ brought SQL-familiar tooling to the Lakehouse. I have found myself using AI to reverse-engineer Power BI models and building open-source tooling on top of the Tabular Object Model through semantic-link-labs. The problems I solve in 2026 look nothing like the problems I solved in 2016.

What stayed the same

The community.

That is the thread that runs through all ten years without interruption. The tools changed completely. The community changed shape too, but it did not break. PASS dissolved. SQLSaturday became Data Saturday. The #sqlfamily hashtag gave way to broader data community networks. But the people kept showing up.

The person who initially nominated me for the MVP award: Mark Broadbent (l|b|m|x). The organizer who let me both speak and help run events for years. The group of Danes who went to dinner together after Summit sessions in Seattle. The local organizers in Dublin, Gothenburg, Chicago, Prague, Pittsburgh, New York, Krakov, Stockholm, London, Newport, Cambridge, Hanau, Munich, Thorshavn, Utrecht…, who each time made space for someone to come talk about whatever they had just figured out. Every one of those moments is part of this decade.

I still organize Data Saturday Denmark. Some years it goes smoothly. Some years 95 people register and do not show up, and you have to write the uncomfortable post about it, with new rules and transparent reasoning, because that is what organizing actually looks like: not just running the event but protecting its future.

The parallel timelines

If I map my MVP journey against the platform’s, the parallels are hard to miss:

When I got the award, the platform was SQL Server with Azure as a sidecar. My expertise was multidimensional cubes and MDX. The community gathered under the PASS umbrella.

Five years in, the platform was Azure-first with Power BI as the analytical surface. My expertise had shifted to data modeling, deployment pipelines, and cloud administration. The community was regrouping after PASS shut down.

Now, at ten years, the platform is Fabric. My expertise is Lakehouses, semantic models, Notebooks, and AI-assisted development. The community runs on Data Saturdays, LinkedIn, and local user groups that feel smaller but more committed.

Three different jobs descriptions (same place) over one continuous award. The MVP title stayed the same on paper. Everything underneath it turned over at least twice.

What I take from it

I do not have a tidy lesson. I have an observation. The practitioners who stayed relevant through these shifts were not the ones who predicted the roadmap correctly. They were the ones who kept testing things with real data, publishing their mistakes, and showing up to events where they might learn something new. That is what I tried to do. Some years better than others.

Ten years. Same instinct, completely different platform.

Semantic Model Diff

…a change tracker for Power BI semantic models

Semantic models do not have version control. Not really. You can store .pbip files in Git, and Tabular Editor gives you a .bim file you can diff, but neither of those workflows answers the simplest question a team asks after an update cycle: what changed?

I do not mean “which file was touched.” I mean: which measures were modified, which columns were added, which relationships were removed, and what exactly is different in the DAX expression that someone edited last Tuesday. That is the question I kept running into, and the one I built this notebook to answer.

This is my third submission to the Fabric Semantic Link Developer Experience Challenge. The first two (a DAX unit test harness and a lakehouse validation notebook) test whether the model produces correct results. This one tests whether the model itself has changed, and tells you precisely how.

The problem

Every Power BI developer has been in this situation. A semantic model gets promoted from dev to production. Something breaks. A report shows unexpected numbers, or a relationship is missing, or a column that used to exist is gone. The first question is always: “What changed since last time?”

Without a structured answer, the debugging starts. Someone opens Tabular Editor and clicks through tables. Someone else opens the old .bim file in a text editor and does a manual comparison. If the model has 40 tables and 200 measures, that process takes long enough that you start questioning your career choices.

I wanted something I could run in a Fabric notebook that would snapshot the full model metadata, store it as JSON, and then produce a colour-coded diff between any two snapshots. No manual clicking. No text-editor heroics.

How it works

The notebook has two engines: a snapshot engine and a diff engine.

The snapshot engine connects to a semantic model via the Tabular Object Model (TOM), exposed through semantic-link-labs, and captures everything: tables, columns, measures, relationships, partitions, roles, and shared expressions. Each object gets recorded with all its properties. A table entry includes its name, description, hidden flag, data category, and type (regular table, calculated table, or calculation group). A measure entry includes the full DAX expression, format string, display folder, and whether it has a KPI attached.

The snapshot is saved as a plain JSON file in the attached Lakehouse’s Files area. File names encode the dataset name and UTC timestamp, so snapshots sort chronologically and you can keep as many as you need.

The diff engine takes any two snapshots and compares them object by object. Each object type has a natural key (for tables, the name; for columns, the table plus column name; for relationships, the relationship name). Objects are categorised as Added, Removed, or Modified. For modified objects, the engine reports exactly which properties changed and shows the old and new values side by side.

What gets captured

The snapshot covers seven object types:

  • Tables: name, description, hidden, data category, type
  • Columns: data type, format string, calculated column expression, display folder, sort-by column, summarize-by
  • Measures: DAX expression, format string, display folder, KPI presence
  • Relationships: from/to table and column, active flag, cross-filter direction, cardinality
  • Partitions: mode, source type, query or M expression
  • Roles: RLS row filter expressions per table
  • Shared expressions: M parameters and Power Query definitions

That last category matters more than you might expect. Shared expressions include connection strings and parameter values. If someone changed the data source path in a parameter, this diff catches it.

The diff output

The diff produces a summary table first: one row per object type, with counts of added, removed, and modified objects. In a clean promotion where nothing changed, all counts are zero and you get a green “No changes detected” message. That is the happy path.

When changes exist, the notebook renders a colour-coded DataFrame per object type. Green rows for additions, red for removals, amber for modifications. Modified rows show the specific property, old value, and new value on the same line. If a measure’s DAX expression changed, you see the full old expression and the full new expression right there.

The notebook also exports the full diff as a self-contained HTML file. I use that for sharing with team members who are not running notebooks. Drop it in a Teams channel or attach it to a pull request.

Using it

The workflow is four steps:

  1. Take a baseline snapshot before making changes
  2. Edit the model (in Power BI Desktop, Fabric Model View, Tabular Editor, wherever)
  3. Take a new snapshot
  4. Compare the two and review the diff

Configuration is two variables: DATASET (the name or GUID of the semantic model) and WORKSPACE (optional, defaults to the notebook’s workspace). Everything else runs from those values. Authentication is handled by Fabric, so there are no credentials to manage.

If you already have two snapshot files and just want to compare them, you can skip straight to step 4. The notebook auto-selects the two most recent snapshots if you do not specify file paths explicitly.

Why not just diff the .bim file?

A .bim file is JSON, so you can run a text diff on it. I have done that. The problem is that a .bim file is a single monolithic document. A text diff on a 15,000-line JSON file where someone reordered a few properties is not a useful diff. You get hundreds of lines of noise for a single meaningful change.

The structured diff in this notebook compares at the object level, not the text level. If a measure’s expression changed, you see that one measure and just the changed expression. Everything else is filtered out. The signal-to-noise ratio is massively better.

The .bim diff also does not work if you are comparing across environments. If you want to verify that a dev model matches production before promotion, you need to snapshot both and compare the metadata, not the files. This notebook handles that.

Where this goes next

Running this in a Fabric pipeline as a post-publish step is the obvious next move. Snapshot the model after every deployment, compare against the previous snapshot, and fail the pipeline (or post a Teams notification) if objects were unexpectedly removed.

Cross-environment comparison is another use case I am already running: snapshot Dev and Prod on the same day, diff them, and confirm that what was promoted is exactly what was intended.

A more ambitious extension would be inline DAX expression diffs using a line-diff algorithm instead of full expression replacement. The structure supports it. I have not built that part yet.

The notebook is submitted to the Fabric Notebook Gallery as part of the Semantic Link Developer Experience Challenge. If you have ever spent an afternoon trying to figure out what changed in a semantic model, this might save you the detective work.

Source code available here: vestergaardj/Semantic-Link-TestHarness: Semantic Model Test Harness: Unit & Regression Testing for DAX (using Semantic Link Labs)

Validating DAX Against Your Lakehouse with Semantic Link

A semantic model is a promise. It promises that the numbers in your reports match the data in your lakehouse. But after enough model changes, renamed columns, new relationships, and tweaked measures, that promise gets harder to verify. I wanted a way to check it programmatically.

This is my second submission to the Fabric Semantic Link Developer Experience Challenge. The first was a DAX unit test harness that compares measures against hardcoded expected values. That works well for known business rules, but it has a limitation: someone has to decide and maintain what the “right” answer is. For a model with hundreds of measures across dozens of filter contexts, that does not scale.

So I built something different. Instead of hardcoding expected values, I use the Lakehouse as the ground truth.

The idea

If your semantic model sits on top of a Fabric Lakehouse, then both the DAX layer and the SQL layer should agree on the same numbers. A COUNTROWS('fact_ticket_metrics') in DAX should return the same count as SELECT COUNT(*) FROM Gold.fact_ticket_metrics in Spark SQL. If they diverge, something changed in the model that needs attention.

The notebook takes pairs of queries: one DAX, one SQL. It executes both, normalizes the results into comparable DataFrames, and reports pass or fail. The Lakehouse is the single source of truth.

How it works

Each test case is a dictionary with a description, a DAX query, and a SQL query. Optionally, you can specify sort columns, a floating-point tolerance, or a column mapping if the names differ between the two result sets.

test_cases = [
    {
        "description": "Row count tickets",
        "dax_query": """
            EVALUATE
            ROW("RowCount", COUNTROWS('fact_ticket_metrics'))
        """,
        "sql_query": """
            SELECT COUNT(*) AS RowCount
            FROM Gold.fact_ticket_metrics
        """,
    },
    {
        "description": "Total # Incidents",
        "dax_query": """
            EVALUATE
            ROW("fact_ticket_metrics", [Incidents])
        """,
        "sql_query": """
            SELECT COUNT(id) AS Incidents
            FROM Gold.fact_ticket_metrics
        """,
        "tolerance": 0.01,
    },
]

The harness loops through each test case, executes the DAX query via sempy_labs.evaluate_dax_impersonation, executes the SQL query via Spark, and then compares the two result DataFrames.

Column normalization

DAX returns column names in the format 'Table'[Column]. SQL returns plain column names. If you want to compare them, those names need to match.

A small regex function strips DAX table qualifiers: 'Sales'[Amount] becomes Amount, and [Amount] also becomes Amount. After normalization, the harness aligns both DataFrames on their common columns, sorted alphabetically. Any extra columns on either side get flagged as warnings but do not block the comparison.

If the normalized names still do not match (say the DAX column is RowCount but the SQL column is row_count), you can pass a column_mapping dictionary to handle the translation explicitly.

Floating-point tolerance

Not every number comparison should demand exact equality. Aggregations in DAX and Spark can produce slightly different floating-point results depending on processing order and precision. The harness uses numpy.isclose with a configurable relative tolerance (default: 0.0001) for numeric columns. String columns are compared as exact matches.

When a numeric mismatch exceeds the tolerance, the harness reports the specific column, row, DAX value, and SQL value. It caps the output at five mismatches per column to keep the report readable when something is broadly wrong.

The comparison engine

The comparison works at the DataFrame level, not just scalar values. This matters because many useful validation queries return multiple rows: a count per category, a sum per month, a distinct count per workspace. Scalar-only testing misses structural issues like missing rows or extra groupings.

The engine does three things in sequence:

  1. Aligns both DataFrames on their common columns and sorts them consistently
  2. Compares numeric columns with tolerance, string columns with exact match
  3. Collects mismatches into a diff DataFrame for inspection

A shape mismatch (different number of rows or columns) is an immediate failure. You get the exact dimensions from both sides so you know whether the issue is missing data or a query that groups differently.

What I tested against

I ran this against a Zendesk reporting model that sits on a Gold-layer lakehouse. The model has ticket metrics, incident counts, and support analytics. The test cases validated that the semantic model’s row counts and measure aggregations matched the underlying SQL tables.

This is the kind of model where schema drift is common. New ticket categories get added, fields get renamed upstream, and the Gold layer evolves. Having an automated check that the DAX layer still reflects reality saves the awkward moment when someone asks why the dashboard numbers do not match the data export.

How this differs from unit testing

My other submission, the Semantic Model Test Harness, validates DAX against hardcoded expected values. That is unit testing: does this specific measure, with this specific filter, return this specific number?

This notebook is closer to integration testing. It validates that the semantic model agrees with its source data. The two approaches complement each other:

  • Unit tests catch business logic errors (a measure formula was changed incorrectly)
  • Lakehouse comparison tests catch data layer drift (the model no longer reflects what is in the tables)

Running both gives you confidence from two different angles.

Where this goes next

The test case format supports multi-row comparisons, so extending this to validate entire dimension tables (not just aggregated measures) is straightforward. I can also see connecting this to Fabric pipeline orchestration, running the comparison notebook as a post-refresh step to detect drift immediately after data lands.

Another natural extension: generating test cases automatically by introspecting the semantic model’s measures and matching them to lakehouse tables. Semantic Link Labs has functions for listing model metadata that could feed into a test case generator. I have not built that yet, but the structure is there.

The notebook is submitted to the Fabric Notebook Gallery as part of the Semantic Link Developer Experience Challenge. If you are running semantic models on top of a Lakehouse and have ever wondered whether they still agree, this might save you some manual checking.

Unit Testing DAX with Semantic Link

Every BI developer has felt it. You change a measure, update a relationship, or rename a column in a semantic model, and then you spend the next hour clicking through report pages to check if something broke. Manual spot-checking is how most teams validate DAX today. It works until it does not.

I have been building and maintaining semantic models for years. The further I get into Fabric-based development, the more my models start to feel like production code. They power dashboards that drive decisions. They feed downstream pipelines. When something breaks, the blast radius is real. And yet, the testing story has always been: deploy, open the report, squint at the numbers.

That gap bothered me enough to do something about it.

The challenge

Microsoft recently launched the Fabric Semantic Link Developer Experience Challenge, a community contest focused on building reusable tools that improve how teams develop, test, document, and maintain semantic models in Microsoft Fabric. The requirement: use Semantic Link as a core component and solve a real developer pain point.

I have been eyeing Semantic Link Labs for a while. The library exposes evaluate_dax_impersonation, which lets you execute arbitrary DAX queries against a Fabric semantic model from a notebook. That single function is what makes programmatic testing possible.

The idea for my submission: a test harness that brings unit testing and regression detection to DAX measures. Define your test cases. Run them. Get a pass/fail report. No browser required.

What I built

The Semantic Model Test Harness is a single Fabric notebook. No external services, no complex infrastructure. You define test cases as rows in a pandas DataFrame, each row specifying three things:

  • The DAX measure to evaluate
  • The filter context to apply (a DAX boolean expression simulating a slicer or page filter)
  • The expected value

Here is what a test case definition looks like:

dax_tests = pd.DataFrame([
    {
        "measure": "# Reports",
        "filter_context": "'Catalog - Report'[Report Workspace] = 'Arla DK'",
        "expected_value": 61
    },
    {
        "measure": "# Workspace Users",
        "filter_context": "'Catalog - Workspace'[Workspace] = 'CatMan Next DK - Demo'",
        "expected_value": 15
    },
])

Each test case gets transformed into a EVALUATE ROW(...) DAX query that wraps the measure in a CALCULATE with the specified filter. The harness sends that query to the semantic model via sempy_labs.evaluate_dax_impersonation(), compares the result to the expected value, and records pass or fail.

DAX query generation

One thing I had to sort out early: DAX has opinions about quoting. Single quotes wrap table names, double quotes wrap string values. Filter expressions like 'Catalog - Report'[Report Workspace] = 'Arla DK' need the 'Arla DK' portion converted to "Arla DK" before execution. A small regex helper handles that conversion automatically.

The harness also distinguishes between boolean filter expressions (like 'Table'[Column] = "Value") and table function expressions (like FILTER(...) or VALUES(...)). Both are valid in a CALCULATE, but the detection matters for correct query construction. A simple heuristic checks for DAX table function prefixes and falls back to boolean if none are found.

The generated DAX query for a test case looks like this:

EVALUATE ROW("Value", CALCULATE([# Reports], 'Catalog - Report'[Report Workspace] = "Arla DK"))

Running the tests

Execution is straightforward. The harness loops through every row in the test DataFrame, builds the DAX query, sends it to the model, and collects results. Each test produces a row in the results DataFrame showing the measure, filter context, expected value, actual value, pass/fail status, and the exact DAX query used.

If a test fails because of a connectivity error, invalid DAX, or anything else unexpected, the exception is caught and logged as a failure with the error message preserved. No silent swallowing of errors.

The results summary counts passes and failures. Failed tests get highlighted separately so they stand out:

Total tests: 2
Passed: 2
Failed: 0
✅ All tests passed!

When something does fail, you get the actual value alongside the expected value, plus the DAX query that was sent. That gives you everything needed to diagnose whether the issue is in the model, the test definition, or the filter context.

The model I tested against

I ran this against my CatMan BI Tenant Stats semantic model, a model I maintain for Power BI tenant administration and monitoring. It tracks workspaces, reports, datasets, users, activity, and permissions across our organization’s Power BI tenant. The model has 17 tables covering catalog metadata, user licensing, activity logs, and calendar dimensions.

This is a model that changes regularly as new workspaces spin up, users rotate, and reporting patterns shift. Exactly the kind of model where silent measure breakage is a real risk.

What I learned

Test case design is the hard part. Writing the harness code was relatively quick. Deciding which measures to test, with which filter contexts, and what counts as a correct expected value requires genuine domain knowledge. This is not something you can auto-generate meaningfully. You need a human who knows the business logic.

Filter context quoting will trip you up. DAX’s quoting rules are well-documented, but in practice, switching between single and double quotes across table names and string values is a reliable source of errors when constructing queries programmatically. The regex helper saved me repeated debugging sessions.

evaluate_dax_impersonation is the unlock. Without this function from Semantic Link Labs, you would need to stand up a XMLA endpoint connection, handle authentication separately, and manage the query lifecycle yourself. Semantic Link wraps all of that. The function takes a dataset name, a workspace name, and a DAX query string, then returns a DataFrame. That simplicity is what makes a notebook-based test harness practical.

Regression testing needs baselines. The current harness compares against hardcoded expected values. For a production CI/CD integration, you would want a baseline snapshot mechanism: run tests, store results, then compare future runs against the stored baseline rather than manually maintained numbers. I have not built that yet, but the architecture supports it.

Where this goes next

The notebook is designed to be dropped into a Fabric workspace and run on demand or triggered as part of a deployment pipeline. Fabric notebooks can be orchestrated through pipelines, so running this harness as a post-deployment validation step is a natural fit.

I can also see extending the test case format to include tolerance thresholds for measures that fluctuate (like row counts on live data) rather than requiring exact matches. And grouping tests by business domain or model area would help when you want to run a targeted suite after changing a specific part of the model.

For now, it works. I define my tests, I run the notebook, and I get a clear answer: did something break, or is the model still behaving as expected? That is a better answer than opening six report pages and eyeballing numbers.

The notebook is submitted to the Fabric Notebook Gallery as part of the Semantic Link Developer Experience Challenge. If you are maintaining semantic models in Fabric and have felt that same testing gap, give it a try. Let me know in the comments if you find it useful, or if you run into edge cases I have not covered.

Developer Experience Challenge – DAX Dependency Graph

The Fabric Semantic Link Developer Experience Challenge gave me a concrete reason to build something I had been meaning to build for a while. The result is a Fabric notebook that extracts every DAX measure from a Power BI semantic model, maps measure-to-measure dependencies into a directed graph, and produces a ranked complexity audit with risk ratings. This post is about what it does, how it works, and why the approach is useful beyond the challenge context.

The problem it solves

If you have worked on a Power BI semantic model that has grown organically over a few years, you already know the problem. Measures reference other measures. Those measures reference other measures. Nobody wrote it down. The person who built the original [Gross Margin Adj YTD] has since moved on, and the name was self-explanatory at the time.

Power BI Desktop shows you a measure’s DAX expression when you click on it. It does not show you which other measures that expression depends on, how deep the dependency chain goes, or whether any measure sits in a circular reference loop that the engine has been quietly working around. Tabular Editor helps, but it still requires manual navigation. There is no built-in view that answers “what are the ten most complex measures in this model, and which ones does everything else depend on?”

That is what this notebook answers.

Getting the measures out

The notebook uses sempy.fabric.list_measures() from Semantic Link to pull every measure from the target model in a single call. It returns a pandas DataFrame with measure name, parent table, DAX expression, visibility, and description per row.

measures_df = fabric.list_measures(dataset=DATASET_NAME, workspace=WORKSPACE_NAME)

Under the hood, Semantic Link connects via the Tabular Object Model (TOM) over the XMLA endpoint. Fabric handles authentication from the notebook’s identity. Two config values is all the setup needed:

WORKSPACE_NAME = None                      # None = current workspace
DATASET_NAME   = "YourSemanticModelName"

Then Run All.

Parsing measure references: three passes

The interesting part is working out which measures each expression actually references. DAX uses square bracket notation for both measures and columns: [Total Revenue] is a measure reference, Sales[Amount] is a column reference. The parser has to distinguish them correctly.

It does this in three passes:

  1. Strip string literals ("...") to avoid false positives. A FORMAT call like FORMAT([Date], "Total Revenue") would otherwise incorrectly register a dependency on [Total Revenue].
  2. Strip single-line and multi-line comments (// and /* */).
  3. Extract all [Name] patterns where the opening bracket is not preceded by a word character, digit, or apostrophe. That lookbehind excludes table-qualified references like Sales[Amount] and 'My Table'[Column].

The extracted names are then cross-referenced against the full set of known measure names. Anything that is not a measure name is discarded.

pattern = r"(?<![a-zA-Z0-9_'])\[([^\]]+)\]"
matches = re.findall(pattern, cleaned)
return list({m for m in matches if m in measure_names})

This correctly handles the common edge cases in real models. The one known limitation: measures referenced via SELECTEDMEASURE() or through a disconnected table SWITCH pattern cannot be resolved statically. If your model uses those patterns heavily, some dependencies will be missing from the graph.

Building the graph

Once the parser has run on every expression, the dependencies go into a NetworkX directed graph. Each measure is a node. An edge A -> B means “A’s DAX expression references measure B” — A depends on B.

The graph direction is important. It lets the tool compute:

  • In-degree (fan-in): how many measures depend on this one. High fan-in means “hub” measure. Breaking it cascades everywhere.
  • Out-degree (fan-out): how many measures this one calls. High fan-out means complex composition.
  • Longest path from any node: the transitive dependency depth.
  • Cycles: circular reference chains.

From those two properties alone, the next three analyses fall out naturally.

Five analyses

Dead measures. In-degree of zero means no other measure references this one. It might be a top-level report measure used directly in a visual, or it might be genuinely unused. The notebook flags all of them; cross-referencing with report usage is a follow-up step.

Root measures. Out-degree of zero means no dependencies on other measures. These are the foundation: the SUM(Sales[Amount]) base calculations that everything else builds on. Errors in root measures propagate silently through every measure above them.

Circular references. The notebook runs Johnson’s algorithm via nx.simple_cycles() to find every elementary cycle in the graph. In a well-designed model the result is: “No circular dependencies detected.” When it is not, the full chain is printed — A -> B -> C -> A — so you know exactly what to untangle.

Complexity scoring. Each measure gets a weighted composite score across six dimensions:

DimensionWeightRationale
CALCULATE / CALCULATETABLE count3Context transitions are the primary source of subtle DAX bugs
Max parenthesis nesting depth1Readability proxy
Branching (IF / SWITCH)2Code path count
Filter functions (FILTER, ALL, ALLEXCEPT, etc.)2Filter-context manipulation
Dependency depth (longest downstream path)2Transitive error amplification
Fan-out (direct measure references)1Composition width

The weights are the part most open to debate. I gave CALCULATE the highest weight because context-transition confusion is, in my experience, where DAX models accumulate the most invisible risk. The depth and branching weights reflect that those properties make a measure harder to verify than to write.

Visual dependency DAG. The notebook renders the graph with matplotlib. Node color encodes complexity score on a green-to-red scale. Node size encodes in-degree, so hub measures are physically larger. An optional pyvis interactive version renders inline in the notebook via an iframe: zoomable, draggable, with hover tooltips showing measure name, table, and score.

The audit report

The final step consolidates everything into a single DataFrame sorted by risk rating, then by descending complexity score within each tier.

Risk logic:

  • Critical: participates in any circular reference
  • High: complexity score ≥ 20
  • Medium: score between 10 and 19
  • Low: everything else

The summary banner above the table looks like this:

============================================================
  DAX DEPENDENCY GRAPH - AUDIT SUMMARY
============================================================
  Total measures:            147
  Dependency edges:          312
  Unreferenced measures:     38
  Root measures (no deps):   22
  Circular references:       0
  High complexity (>=20):    11
  Medium complexity (10-19): 29
============================================================

That is the starting point for any refactoring conversation. Eleven measures scoring High is a concrete prioritisation signal: start there, not with the 38 unreferenced ones.

Limitations worth knowing

The parser only resolves measure-to-measure dependencies. Column-level lineage is out of scope. Calculation groups are not modeled as nodes, so models that use them heavily will have gaps. Cross-model references (composite model / DirectQuery to external datasets) are not in scope either.

The “unreferenced” flag does not mean unused. It means not referenced by other measures. A measure used directly in 15 report visuals will still show as unreferenced in this graph, because the tool has no report-level visibility. That cross-reference is worth doing separately with sempy.fabric.list_reports() if you are planning to delete anything.

Getting the notebook

The notebook is on GitHub: github.com/vestergaardj/DDG-DAX-Dependency-Graph. Upload it to any Fabric workspace, set the two configuration values in the Configuration cell, and run all. Everything else is automatic.

It requires semantic-link-sempy, networkx, and matplotlib, all of which come pre-installed in Fabric. pyvis is optional; the static graph still renders without it.