Semantic Model Diff

…a change tracker for Power BI semantic models

Semantic models do not have version control. Not really. You can store .pbip files in Git, and Tabular Editor gives you a .bim file you can diff, but neither of those workflows answers the simplest question a team asks after an update cycle: what changed?

I do not mean “which file was touched.” I mean: which measures were modified, which columns were added, which relationships were removed, and what exactly is different in the DAX expression that someone edited last Tuesday. That is the question I kept running into, and the one I built this notebook to answer.

This is my third submission to the Fabric Semantic Link Developer Experience Challenge. The first two (a DAX unit test harness and a lakehouse validation notebook) test whether the model produces correct results. This one tests whether the model itself has changed, and tells you precisely how.

The problem

Every Power BI developer has been in this situation. A semantic model gets promoted from dev to production. Something breaks. A report shows unexpected numbers, or a relationship is missing, or a column that used to exist is gone. The first question is always: “What changed since last time?”

Without a structured answer, the debugging starts. Someone opens Tabular Editor and clicks through tables. Someone else opens the old .bim file in a text editor and does a manual comparison. If the model has 40 tables and 200 measures, that process takes long enough that you start questioning your career choices.

I wanted something I could run in a Fabric notebook that would snapshot the full model metadata, store it as JSON, and then produce a colour-coded diff between any two snapshots. No manual clicking. No text-editor heroics.

How it works

The notebook has two engines: a snapshot engine and a diff engine.

The snapshot engine connects to a semantic model via the Tabular Object Model (TOM), exposed through semantic-link-labs, and captures everything: tables, columns, measures, relationships, partitions, roles, and shared expressions. Each object gets recorded with all its properties. A table entry includes its name, description, hidden flag, data category, and type (regular table, calculated table, or calculation group). A measure entry includes the full DAX expression, format string, display folder, and whether it has a KPI attached.

The snapshot is saved as a plain JSON file in the attached Lakehouse’s Files area. File names encode the dataset name and UTC timestamp, so snapshots sort chronologically and you can keep as many as you need.

The diff engine takes any two snapshots and compares them object by object. Each object type has a natural key (for tables, the name; for columns, the table plus column name; for relationships, the relationship name). Objects are categorised as Added, Removed, or Modified. For modified objects, the engine reports exactly which properties changed and shows the old and new values side by side.

What gets captured

The snapshot covers seven object types:

  • Tables: name, description, hidden, data category, type
  • Columns: data type, format string, calculated column expression, display folder, sort-by column, summarize-by
  • Measures: DAX expression, format string, display folder, KPI presence
  • Relationships: from/to table and column, active flag, cross-filter direction, cardinality
  • Partitions: mode, source type, query or M expression
  • Roles: RLS row filter expressions per table
  • Shared expressions: M parameters and Power Query definitions

That last category matters more than you might expect. Shared expressions include connection strings and parameter values. If someone changed the data source path in a parameter, this diff catches it.

The diff output

The diff produces a summary table first: one row per object type, with counts of added, removed, and modified objects. In a clean promotion where nothing changed, all counts are zero and you get a green “No changes detected” message. That is the happy path.

When changes exist, the notebook renders a colour-coded DataFrame per object type. Green rows for additions, red for removals, amber for modifications. Modified rows show the specific property, old value, and new value on the same line. If a measure’s DAX expression changed, you see the full old expression and the full new expression right there.

The notebook also exports the full diff as a self-contained HTML file. I use that for sharing with team members who are not running notebooks. Drop it in a Teams channel or attach it to a pull request.

Using it

The workflow is four steps:

  1. Take a baseline snapshot before making changes
  2. Edit the model (in Power BI Desktop, Fabric Model View, Tabular Editor, wherever)
  3. Take a new snapshot
  4. Compare the two and review the diff

Configuration is two variables: DATASET (the name or GUID of the semantic model) and WORKSPACE (optional, defaults to the notebook’s workspace). Everything else runs from those values. Authentication is handled by Fabric, so there are no credentials to manage.

If you already have two snapshot files and just want to compare them, you can skip straight to step 4. The notebook auto-selects the two most recent snapshots if you do not specify file paths explicitly.

Why not just diff the .bim file?

A .bim file is JSON, so you can run a text diff on it. I have done that. The problem is that a .bim file is a single monolithic document. A text diff on a 15,000-line JSON file where someone reordered a few properties is not a useful diff. You get hundreds of lines of noise for a single meaningful change.

The structured diff in this notebook compares at the object level, not the text level. If a measure’s expression changed, you see that one measure and just the changed expression. Everything else is filtered out. The signal-to-noise ratio is massively better.

The .bim diff also does not work if you are comparing across environments. If you want to verify that a dev model matches production before promotion, you need to snapshot both and compare the metadata, not the files. This notebook handles that.

Where this goes next

Running this in a Fabric pipeline as a post-publish step is the obvious next move. Snapshot the model after every deployment, compare against the previous snapshot, and fail the pipeline (or post a Teams notification) if objects were unexpectedly removed.

Cross-environment comparison is another use case I am already running: snapshot Dev and Prod on the same day, diff them, and confirm that what was promoted is exactly what was intended.

A more ambitious extension would be inline DAX expression diffs using a line-diff algorithm instead of full expression replacement. The structure supports it. I have not built that part yet.

The notebook is submitted to the Fabric Notebook Gallery as part of the Semantic Link Developer Experience Challenge. If you have ever spent an afternoon trying to figure out what changed in a semantic model, this might save you the detective work.

Source code available here: vestergaardj/Semantic-Link-TestHarness: Semantic Model Test Harness: Unit & Regression Testing for DAX (using Semantic Link Labs)

Developer Experience Challenge – DAX Dependency Graph

The Fabric Semantic Link Developer Experience Challenge gave me a concrete reason to build something I had been meaning to build for a while. The result is a Fabric notebook that extracts every DAX measure from a Power BI semantic model, maps measure-to-measure dependencies into a directed graph, and produces a ranked complexity audit with risk ratings. This post is about what it does, how it works, and why the approach is useful beyond the challenge context.

The problem it solves

If you have worked on a Power BI semantic model that has grown organically over a few years, you already know the problem. Measures reference other measures. Those measures reference other measures. Nobody wrote it down. The person who built the original [Gross Margin Adj YTD] has since moved on, and the name was self-explanatory at the time.

Power BI Desktop shows you a measure’s DAX expression when you click on it. It does not show you which other measures that expression depends on, how deep the dependency chain goes, or whether any measure sits in a circular reference loop that the engine has been quietly working around. Tabular Editor helps, but it still requires manual navigation. There is no built-in view that answers “what are the ten most complex measures in this model, and which ones does everything else depend on?”

That is what this notebook answers.

Getting the measures out

The notebook uses sempy.fabric.list_measures() from Semantic Link to pull every measure from the target model in a single call. It returns a pandas DataFrame with measure name, parent table, DAX expression, visibility, and description per row.

measures_df = fabric.list_measures(dataset=DATASET_NAME, workspace=WORKSPACE_NAME)

Under the hood, Semantic Link connects via the Tabular Object Model (TOM) over the XMLA endpoint. Fabric handles authentication from the notebook’s identity. Two config values is all the setup needed:

WORKSPACE_NAME = None                      # None = current workspace
DATASET_NAME   = "YourSemanticModelName"

Then Run All.

Parsing measure references: three passes

The interesting part is working out which measures each expression actually references. DAX uses square bracket notation for both measures and columns: [Total Revenue] is a measure reference, Sales[Amount] is a column reference. The parser has to distinguish them correctly.

It does this in three passes:

  1. Strip string literals ("...") to avoid false positives. A FORMAT call like FORMAT([Date], "Total Revenue") would otherwise incorrectly register a dependency on [Total Revenue].
  2. Strip single-line and multi-line comments (// and /* */).
  3. Extract all [Name] patterns where the opening bracket is not preceded by a word character, digit, or apostrophe. That lookbehind excludes table-qualified references like Sales[Amount] and 'My Table'[Column].

The extracted names are then cross-referenced against the full set of known measure names. Anything that is not a measure name is discarded.

pattern = r"(?<![a-zA-Z0-9_'])\[([^\]]+)\]"
matches = re.findall(pattern, cleaned)
return list({m for m in matches if m in measure_names})

This correctly handles the common edge cases in real models. The one known limitation: measures referenced via SELECTEDMEASURE() or through a disconnected table SWITCH pattern cannot be resolved statically. If your model uses those patterns heavily, some dependencies will be missing from the graph.

Building the graph

Once the parser has run on every expression, the dependencies go into a NetworkX directed graph. Each measure is a node. An edge A -> B means “A’s DAX expression references measure B” — A depends on B.

The graph direction is important. It lets the tool compute:

  • In-degree (fan-in): how many measures depend on this one. High fan-in means “hub” measure. Breaking it cascades everywhere.
  • Out-degree (fan-out): how many measures this one calls. High fan-out means complex composition.
  • Longest path from any node: the transitive dependency depth.
  • Cycles: circular reference chains.

From those two properties alone, the next three analyses fall out naturally.

Five analyses

Dead measures. In-degree of zero means no other measure references this one. It might be a top-level report measure used directly in a visual, or it might be genuinely unused. The notebook flags all of them; cross-referencing with report usage is a follow-up step.

Root measures. Out-degree of zero means no dependencies on other measures. These are the foundation: the SUM(Sales[Amount]) base calculations that everything else builds on. Errors in root measures propagate silently through every measure above them.

Circular references. The notebook runs Johnson’s algorithm via nx.simple_cycles() to find every elementary cycle in the graph. In a well-designed model the result is: “No circular dependencies detected.” When it is not, the full chain is printed — A -> B -> C -> A — so you know exactly what to untangle.

Complexity scoring. Each measure gets a weighted composite score across six dimensions:

DimensionWeightRationale
CALCULATE / CALCULATETABLE count3Context transitions are the primary source of subtle DAX bugs
Max parenthesis nesting depth1Readability proxy
Branching (IF / SWITCH)2Code path count
Filter functions (FILTER, ALL, ALLEXCEPT, etc.)2Filter-context manipulation
Dependency depth (longest downstream path)2Transitive error amplification
Fan-out (direct measure references)1Composition width

The weights are the part most open to debate. I gave CALCULATE the highest weight because context-transition confusion is, in my experience, where DAX models accumulate the most invisible risk. The depth and branching weights reflect that those properties make a measure harder to verify than to write.

Visual dependency DAG. The notebook renders the graph with matplotlib. Node color encodes complexity score on a green-to-red scale. Node size encodes in-degree, so hub measures are physically larger. An optional pyvis interactive version renders inline in the notebook via an iframe: zoomable, draggable, with hover tooltips showing measure name, table, and score.

The audit report

The final step consolidates everything into a single DataFrame sorted by risk rating, then by descending complexity score within each tier.

Risk logic:

  • Critical: participates in any circular reference
  • High: complexity score ≥ 20
  • Medium: score between 10 and 19
  • Low: everything else

The summary banner above the table looks like this:

============================================================
  DAX DEPENDENCY GRAPH - AUDIT SUMMARY
============================================================
  Total measures:            147
  Dependency edges:          312
  Unreferenced measures:     38
  Root measures (no deps):   22
  Circular references:       0
  High complexity (>=20):    11
  Medium complexity (10-19): 29
============================================================

That is the starting point for any refactoring conversation. Eleven measures scoring High is a concrete prioritisation signal: start there, not with the 38 unreferenced ones.

Limitations worth knowing

The parser only resolves measure-to-measure dependencies. Column-level lineage is out of scope. Calculation groups are not modeled as nodes, so models that use them heavily will have gaps. Cross-model references (composite model / DirectQuery to external datasets) are not in scope either.

The “unreferenced” flag does not mean unused. It means not referenced by other measures. A measure used directly in 15 report visuals will still show as unreferenced in this graph, because the tool has no report-level visibility. That cross-reference is worth doing separately with sempy.fabric.list_reports() if you are planning to delete anything.

Getting the notebook

The notebook is on GitHub: github.com/vestergaardj/DDG-DAX-Dependency-Graph. Upload it to any Fabric workspace, set the two configuration values in the Configuration cell, and run all. Everything else is automatic.

It requires semantic-link-sempy, networkx, and matplotlib, all of which come pre-installed in Fabric. pyvis is optional; the static graph still renders without it.