A semantic model is a promise. It promises that the numbers in your reports match the data in your lakehouse. But after enough model changes, renamed columns, new relationships, and tweaked measures, that promise gets harder to verify. I wanted a way to check it programmatically.
This is my second submission to the Fabric Semantic Link Developer Experience Challenge. The first was a DAX unit test harness that compares measures against hardcoded expected values. That works well for known business rules, but it has a limitation: someone has to decide and maintain what the “right” answer is. For a model with hundreds of measures across dozens of filter contexts, that does not scale.
So I built something different. Instead of hardcoding expected values, I use the Lakehouse as the ground truth.
The idea
If your semantic model sits on top of a Fabric Lakehouse, then both the DAX layer and the SQL layer should agree on the same numbers. A COUNTROWS('fact_ticket_metrics') in DAX should return the same count as SELECT COUNT(*) FROM Gold.fact_ticket_metrics in Spark SQL. If they diverge, something changed in the model that needs attention.
The notebook takes pairs of queries: one DAX, one SQL. It executes both, normalizes the results into comparable DataFrames, and reports pass or fail. The Lakehouse is the single source of truth.
How it works
Each test case is a dictionary with a description, a DAX query, and a SQL query. Optionally, you can specify sort columns, a floating-point tolerance, or a column mapping if the names differ between the two result sets.
test_cases = [
{
"description": "Row count tickets",
"dax_query": """
EVALUATE
ROW("RowCount", COUNTROWS('fact_ticket_metrics'))
""",
"sql_query": """
SELECT COUNT(*) AS RowCount
FROM Gold.fact_ticket_metrics
""",
},
{
"description": "Total # Incidents",
"dax_query": """
EVALUATE
ROW("fact_ticket_metrics", [Incidents])
""",
"sql_query": """
SELECT COUNT(id) AS Incidents
FROM Gold.fact_ticket_metrics
""",
"tolerance": 0.01,
},
]
The harness loops through each test case, executes the DAX query via sempy_labs.evaluate_dax_impersonation, executes the SQL query via Spark, and then compares the two result DataFrames.
Column normalization
DAX returns column names in the format 'Table'[Column]. SQL returns plain column names. If you want to compare them, those names need to match.
A small regex function strips DAX table qualifiers: 'Sales'[Amount] becomes Amount, and [Amount] also becomes Amount. After normalization, the harness aligns both DataFrames on their common columns, sorted alphabetically. Any extra columns on either side get flagged as warnings but do not block the comparison.
If the normalized names still do not match (say the DAX column is RowCount but the SQL column is row_count), you can pass a column_mapping dictionary to handle the translation explicitly.
Floating-point tolerance
Not every number comparison should demand exact equality. Aggregations in DAX and Spark can produce slightly different floating-point results depending on processing order and precision. The harness uses numpy.isclose with a configurable relative tolerance (default: 0.0001) for numeric columns. String columns are compared as exact matches.
When a numeric mismatch exceeds the tolerance, the harness reports the specific column, row, DAX value, and SQL value. It caps the output at five mismatches per column to keep the report readable when something is broadly wrong.
The comparison engine
The comparison works at the DataFrame level, not just scalar values. This matters because many useful validation queries return multiple rows: a count per category, a sum per month, a distinct count per workspace. Scalar-only testing misses structural issues like missing rows or extra groupings.
The engine does three things in sequence:
- Aligns both DataFrames on their common columns and sorts them consistently
- Compares numeric columns with tolerance, string columns with exact match
- Collects mismatches into a diff DataFrame for inspection
A shape mismatch (different number of rows or columns) is an immediate failure. You get the exact dimensions from both sides so you know whether the issue is missing data or a query that groups differently.
What I tested against
I ran this against a Zendesk reporting model that sits on a Gold-layer lakehouse. The model has ticket metrics, incident counts, and support analytics. The test cases validated that the semantic model’s row counts and measure aggregations matched the underlying SQL tables.
This is the kind of model where schema drift is common. New ticket categories get added, fields get renamed upstream, and the Gold layer evolves. Having an automated check that the DAX layer still reflects reality saves the awkward moment when someone asks why the dashboard numbers do not match the data export.
How this differs from unit testing
My other submission, the Semantic Model Test Harness, validates DAX against hardcoded expected values. That is unit testing: does this specific measure, with this specific filter, return this specific number?
This notebook is closer to integration testing. It validates that the semantic model agrees with its source data. The two approaches complement each other:
- Unit tests catch business logic errors (a measure formula was changed incorrectly)
- Lakehouse comparison tests catch data layer drift (the model no longer reflects what is in the tables)
Running both gives you confidence from two different angles.
Where this goes next
The test case format supports multi-row comparisons, so extending this to validate entire dimension tables (not just aggregated measures) is straightforward. I can also see connecting this to Fabric pipeline orchestration, running the comparison notebook as a post-refresh step to detect drift immediately after data lands.
Another natural extension: generating test cases automatically by introspecting the semantic model’s measures and matching them to lakehouse tables. Semantic Link Labs has functions for listing model metadata that could feed into a test case generator. I have not built that yet, but the structure is there.
The notebook is submitted to the Fabric Notebook Gallery as part of the Semantic Link Developer Experience Challenge. If you are running semantic models on top of a Lakehouse and have ever wondered whether they still agree, this might save you some manual checking.







