Microsoft Certified: Fabric Data Engineer Associate

I have renewed my Microsoft Certified: Fabric Data Engineer Associate certification for another year, and since this one actually required me to sit down and think rather than just click through some formalities, I thought it worth a quick note here.

The renewal is free. It is an online assessment on Microsoft Learn, not a full proctored exam. That might sound easier than the original DP-700, and in some ways it is. But the questions are updated, the skill areas evolve alongside the platform, and Microsoft has made a point of keeping the content aligned with what Fabric actually looks like today. The skills were last updated on April 20, 2026, with minor changes across most sections.


What the certification covers

The exam breaks down into three equally weighted areas, each around 30 to 35 percent of the total:

Implement and manage an analytics solution covers workspace configuration, lifecycle management with deployment pipelines and version control, security and governance at row, column, object, and folder level, and orchestration choices between Dataflows Gen2, pipelines, and notebooks.

Ingest and transform data is where most of the practical depth lives. Full and incremental load patterns, OneLake shortcuts and mirroring, PySpark, SQL, and KQL transformations, handling duplicates and late-arriving data, and streaming via Eventstreams and Spark structured streaming.

Monitor and optimize an analytics solution covers monitoring ingestion and transformation, resolving errors across pipelines, Dataflows, notebooks, Eventhouses, and Eventstreams, and performance optimization for Lakehouse tables, pipelines, data warehouses, and Spark.


Why bother renewing

Fabric moves fast. I have been working with Fabric in various capacities for a while now, and the platform I certified against originally does not look identical to what I use today. Real-Time Intelligence has matured, OneLake security has grown considerably, and the orchestration story has gotten richer. Going through the renewal assessment is a reasonable way to check whether there are gaps between what I knew and what the platform has become.

It is not a substitute for hands-on work. But it is a decent forcing function to at least read the change log and spend an hour or two with areas that have moved the most.

If you are already certified and have not renewed yet, the renewal assessment is here and the updated study guide for DP-700 is worth a read before you go in.

Semantic Sonar

Canary monitoring for Power BI semantic models

Quarterly sales season always brings the same ritual: a business user opens Power BI, clicks into the sales dashboard, and waits for the numbers to load. It has been a while since anyone looked at this particular report. The numbers look off. Someone hits refresh. The spinner turns, but the data does not budge. The last refresh was weeks ago.

That is when the emails start. “Can you check why the sales model hasn’t updated?” The support thread grows. IT investigates. The answer is usually not in the data. The refresh failed at 3 AM. Or the data source changed its connection string last Tuesday. Or someone revoked the service principal without telling anyone.

By the time the root cause is found, the report has been open in meetings for hours, and the numbers have been wrong for days.

I built Semantic Sonar because I wanted that conversation to happen at 3:01 AM when the refresh failed, not at 10:30 AM when the quarterly review is already underway. The idea is a canary: a lightweight DAX query fired at each semantic model on a schedule. If it comes back clean, the model is reachable and queryable. If it does not, something is wrong right now, not three days from now.

Registering the first model

The first page most people land on after setting up the system is the workspace browser. The alternative is typing workspace GUIDs and dataset IDs by hand, which I did exactly once before deciding to build the browser.

You pick a tenant from the dropdown, the browser calls the Power BI REST API with the tenant’s service principal, and it returns a list of workspaces. Expand a workspace and you see every semantic model in it with its display name. Click Register and a form pre-fills with the workspace ID, dataset ID, and a default DAX query:

EVALUATE ROW("Ping", 1)

That default is intentionally trivial. You want the canary to be cheap. Its job is not to test the data. Its job is to confirm that the model is reachable and that the service principal can still authenticate. A single-row EVALUATE ROW does that in under a second on any model that is working.

For most models I change the query to something slightly more meaningful: a measure that actually touches the fact table, so a permission problem on the underlying source would surface. But the principle stays the same: keep it light, keep it fast, run it often.

The interval sits at 60 minutes by default. For critical models I set it lower, 15 or 30 minutes. The minimum the system allows is 1 minute, though for most models that is more noise than signal.

The dashboard: what you see first

Once a few models are registered and the scheduler has had time to run, the dashboard starts telling its story.

Four numbers across the top: total models, active, failing, disabled. In a healthy fleet they read something like 24, 22, 1, 1. The single failing model is the one you care about. The disabled one was auto-disabled after 30 consecutive failures, which usually means it has been broken for a long time and nobody has got around to it.

Below the stat cards, there are two lists. The first is models at risk: anything with 10 or more consecutive failures. These are on a countdown. At 30 consecutive failures the system disables them automatically. The at-risk list is early warning. You have a window to investigate before the model drops off monitoring entirely.

The second list is recent failures: the last handful of failed checks across all models, with timestamps and the raw error message. That error message is usually enough to know immediately whether something needs attention now:

Power BI API returned 401: {"error":{"code":"TokenExpired",...}}

versus

Power BI API returned 404: {"error":{"code":"ItemNotFound",...}}

TokenExpired means a credential needs rotating. ItemNotFound means someone deleted or moved the model. Different problems, different urgency, different owner to call.

The dashboard auto-refreshes every 60 seconds. I leave it open on a side monitor. When the amber tile count goes up, I look.

Drilling into a model

From the dashboard or the models list, clicking through to a single model shows the full result history.

The top section repeats the key facts: current status badge (Active, Failing, Disabled), consecutive failure count, last run timestamp, last run success or failure. Below that is a latency chart. This is the view that catches the subtler problems.

A model that is technically succeeding but whose latency has climbed from 200ms to 4,000ms over a week is not a healthy model. Something upstream changed. Maybe a gateway is under new load. Maybe a DirectQuery source started doing full scans. The trend is visible here because every result gets recorded with its latency in milliseconds.

The full result history below the chart shows each check: timestamp, success or failure, latency, row count, and the first row returned as JSON. That last one is useful. If your canary query is:

EVALUATE ROW("SalesTotal", [Sum of Sales Value])

then the first row shows the actual value that came back. A value of 0 that held steady for months and then jumped to null is interesting. It does not mean the model is broken in the way the system defines it, but it means something changed.

The models list: the fleet view

The models list page is the operational view. It has search, filter by tenant, filter by status (all/active/disabled/failing), and filter by tag. Tags are free-form strings you can add to any model, so you can group by environment (production, staging), by team, or by data domain.

Sort by health score ascending and the most degraded models float to the top. The health score is a composite across consecutive failure count, recent latency relative to the model’s own baseline, and uptime across the rolling windows. It is not a formal SLA calculation; it is a signal for where to look first.

The card for each model shows the health grade (A through F), the last run time, the consecutive failure count, and a quick-run button. That last one is useful during an investigation: rather than waiting for the next scheduled run, you can trigger a check immediately and watch the result appear in the history a few seconds later.

Uptime: putting numbers on it

The Uptime page is for the conversation with stakeholders who want to know how reliable their models are in terms they can put in a report.

Each model gets three rolling windows: last 24 hours, last 7 days, last 30 days. Each window shows total checks, successes, and uptime percentage with a colour-coded bar:

  • 99.5% and above: green
  • 95 to 99.5%: yellow
  • 90 to 95%: orange
  • Below 90%: red

Across the top of the table, the aggregate uptime across all models for each window gives a fleet-level headline. “Fleet availability last 7 days: 97.4%” is a concrete number to bring to a monthly review.

The same page also shows p95 latency trends per model. Uptime tells you whether the checks are succeeding. Latency tells you whether the models that are succeeding are doing so at acceptable speed.

Radar: the visual overview

The radar page is the one I use when I want to scan the whole fleet at a glance. It renders each model as a dot on a canvas, positioned by p95 latency on one axis and health score on the other, with dot size reflecting which models have the most weight (based on how frequently they run). Colour is the health grade.

In a healthy deployment most dots cluster in the top-left: low latency, high health score, green or lime. The outliers stand out immediately. A large red dot in the bottom-right means a high-frequency model with degraded health and elevated latency. That is the one to investigate first.

Hovering a dot shows a tooltip with model name, tenant, score, grade, and current p95 in milliseconds.

Maintenance windows: the expected failures

One thing that caused false alarms early on was scheduled maintenance. The underlying SQL Server gets taken offline every Saturday between 2 and 4 AM. The canary runs every 30 minutes. The result: a string of failures, the at-risk counter climbing, an alert firing at 2:15 AM for something everyone already knew was happening.

Maintenance windows solve that. Each window has a start time, end time, and day-of-week pattern. Two flags control what happens during the window. suppressAlerts keeps running checks and recording results but does not fire webhooks. skipCanary stops running checks entirely, which is what you want when the data source is deliberately taken offline.

The scheduler checks for an active window before it enqueues a job. If skipCanary is set, it advances the next run time and skips the queue. The model’s consecutive failure counter does not increment. When the window ends, monitoring resumes from a clean state.

Webhooks: getting alerted

The dashboard is useful when you are looking at it. For everything else there are webhooks.

A webhook fires on three events: a model transitioning from healthy to failing, a model recovering after failures, and a model being auto-disabled. You configure one or more webhook URLs on a model, each with an optional shared secret. The payload is a JSON object with the event type, model name, tenant, and a details string.

Payloads are signed with HMAC-SHA256 and the signature goes into an X-Signature header, so the receiving system can verify the sender before acting on the message. Delivery is best-effort: a failed webhook delivery is logged in Application Insights but does not affect the canary result or retry behaviour.

In practice I point webhooks at a Power Automate flow that posts to Teams. A failure notification lands in the right channel within a minute of the first failed check.

The multi-tenant model

The system was built from the start to monitor multiple customers’ models from a single deployment. That shapes the credential design more than anything else.

Each customer is a tenant record in Cosmos DB: their display name, their Entra ID (the GUID of their Azure AD tenant), and the client ID of an app registration created in their tenant. The client secret for that app registration goes into Key Vault with the name tenant-{tenantId}-client-secret.

When the worker runs a check, it reads the right secret, acquires an OAuth 2.0 token scoped to the customer’s Entra tenant, and calls the Power BI executeQueries endpoint on their behalf using their own service principal. The token is cached in memory with a 5-minute buffer before expiry.

var clientSecret = await _keyVault.GetTenantClientSecretAsync(tenant.EntraId, ct);
var credential = new ClientSecretCredential(tenant.EntraId, tenant.ClientId, clientSecret);

All other service-to-service calls (the Function App talking to Cosmos DB, Key Vault, and the Storage Queue) use a system-assigned managed identity. No connection strings in configuration. The Power BI API is the one exception: it requires a credential that the customer’s tenant has consented to, and a managed identity from the hosting tenant cannot satisfy that requirement. So that leg stays on client credentials.

One edge case: live connections and SSAS

Most models work fine with the executeQueries endpoint. The ones that do not are models live-connected to Analysis Services or on-premises SSAS through a gateway, where the gateway blocks the query execution endpoint but still allows the metadata API.

For those, setting queryMode to rest on the model switches the worker to a lighter ping that calls the dataset metadata endpoint instead. It does not execute a DAX query, but it confirms that the service principal can reach the model and that the Power BI API returns a 200. If the gateway is down, that returns a 400 or 503. The rest of the pipeline, the result recording, the consecutive failure counter, the dashboard, is identical.

Audit log

The audit page records configuration changes: who added a model, who changed an interval, who disabled a tenant. It is the answer to “what changed right before the failures started.” Filter by tenant or by entity ID. The last 100 entries load by default.

I added this after the third time someone changed a model’s DAX query to something that always returns zero rows, and nobody could remember who had done it or when.

Deployment

The infrastructure is Bicep, deployed via the Azure Developer CLI:

azd auth login
azd up

That provisions everything: the Static Web App, the Function App, Cosmos DB, Key Vault, Storage, and Application Insights. The Bicep modules are in infra/modules/, and the Azure Developer CLI service definitions in azure.yaml handle the build and deploy sequence for both the .NET Functions project and the Next.js frontend.

After deployment, you add the first tenant and model through the UI, store the client secret in Key Vault, and wait for the first scheduled run. The README has the full post-deployment checklist.

What it does not do

Semantic Sonar tells you whether a model is reachable and queryable. It does not tell you whether the numbers coming out of it are correct. A model that returns [Sum of Sales Value] = 0 because the ETL truncated the fact table will pass every canary check.

The “failing” status also does not distinguish between the model being down, the underlying data source being down, and the service principal’s permissions having expired. The error message in the result history usually makes it clear, but the system does not attempt to classify the cause.

The 90-day TTL on results caps the uptime windows at 30 days. If you want longer retention you can change the TTL in CanaryResult.cs and the Cosmos DB container configuration, but the uptime calculation logic would need updating to match.

Getting the code

The repository is at github.com/vestergaardj/semantic-sonar. The README covers prerequisites, local development, and the full post-deployment walkthrough.

Ten Years as a Microsoft MVP

Ten years. I keep counting it on my fingers like it might come out different the next time. It does not.

In early 2017, Microsoft awarded me the Data Platform MVP for the first time. I had been speaking at community events for about four years at that point, starting with a shaky user group presentation in Copenhagen and gradually building up to SQLSaturdays across Europe. When the notification mail arrived I read it three times. Then I called my wife. Then I read it again.

What I did not anticipate was how much the platform itself would transform in the decade ahead, or how completely my own technical identity would change alongside it. This is a reflection on both.

2016: On-premises and proud of it

When the MVP journey began, my world was SQL Server. Specifically, Analysis Services Multidimensional. I wrote MDX for a living, tuned partition strategies, scripted deployments in PowerShell, and built dimension security models with AMO. The toolchain was mature, the community was tight, and the assumption was that your servers lived in a rack somewhere you could point to.

Azure existed, of course. It was about six years old. But for most of us in the BI world it still felt like someone else’s problem. Azure SQL Database was limited. Azure Analysis Services was in preview. The idea that a serious analytical workload could run entirely in the cloud was something you heard at conference keynotes and politely ignored on Monday morning.

I remember attending a PASS Summit session around that time where someone demoed Azure Data Lake by showing a slide deck of screenshots. With errors in them. That was the state of cloud analytics in our corner of the Microsoft ecosystem. Promising on paper. Not quite there in practice.

2017 to 2020: The cloud came whether we were ready or not

Each year I renewed the MVP award, Azure had added another service that pulled more of my on-premises workload toward the cloud. Azure Data Factory replaced chunks of what SSIS used to do. Azure Synapse started absorbing the data warehouse conversation. Managed identities appeared, and with them a new category of debugging I had never encountered before: authentication that worked in test and mysteriously failed in production with no clear error message.

The learning curve was not optional. I recertified Azure exams not for the credential itself but because the platform had changed enough that I needed to confirm I still understood what I was deploying. Every renewal cycle felt like a different job.

During this same stretch, Power BI went from “promising” to “the center of gravity for everything Microsoft BI.” It shipped in mid-2015, and within two years the conversation at every community event had shifted from “should we look at Power BI?” to “how do we govern hundreds of workspaces?” I took over running the Danish Power BI User Group and watched it grow past 2,000 members. The problems I solved daily moved from MDX tuning to DAX optimization, from partition scripting to deployment pipelines and semantic model management. Different skill set. Same instinct: figure it out, test it, share what happened.

2021 to 2023: The convergence

Then the pieces started merging. Azure Synapse tried to be the unified analytics service. Power BI Premium brought dataset hosting into the cloud. Microsoft kept drawing a tighter circle around what had been separate products, pushing toward something integrated.

And then Fabric arrived.

Announced at Build in May 2023, generally available by November that year. Within months it had absorbed the Power BI service, introduced Lakehouses, OneLake as a unified storage layer, Data Pipelines, and Notebooks as a first-class authoring experience. The pitch was simple: one platform for data engineering, data science, real-time analytics, and business intelligence. No more stitching five Azure services together and hoping the authentication tokens lined up.

2024 to now: A different job

The speed of Fabric’s evolution catches me off guard sometimes. I spent years becoming deeply skilled in SSAS Multidimensional. That knowledge still informs how I think about models, but it is no longer the center of my work. The architecture underneath shifted fundamentally. Lakehouses replaced cubes. Notebooks replaced XMLA scripts. Semantic models became the new version of what we used to call datasets, which themselves replaced what we used to call cubes.

Each rename is not just marketing. Each one reflects a genuine change in how the technology works and what you can build with it.

Today I write Python in Fabric Notebooks to snapshot and diff semantic model metadata. Materialized Lake Views went GA. Fabric IQ brought SQL-familiar tooling to the Lakehouse. I have found myself using AI to reverse-engineer Power BI models and building open-source tooling on top of the Tabular Object Model through semantic-link-labs. The problems I solve in 2026 look nothing like the problems I solved in 2016.

What stayed the same

The community.

That is the thread that runs through all ten years without interruption. The tools changed completely. The community changed shape too, but it did not break. PASS dissolved. SQLSaturday became Data Saturday. The #sqlfamily hashtag gave way to broader data community networks. But the people kept showing up.

The person who initially nominated me for the MVP award: Mark Broadbent (l|b|m|x). The organizer who let me both speak and help run events for years. The group of Danes who went to dinner together after Summit sessions in Seattle. The local organizers in Dublin, Gothenburg, Chicago, Prague, Pittsburgh, New York, Krakov, Stockholm, London, Newport, Cambridge, Hanau, Munich, Thorshavn, Utrecht…, who each time made space for someone to come talk about whatever they had just figured out. Every one of those moments is part of this decade.

I still organize Data Saturday Denmark. Some years it goes smoothly. Some years 95 people register and do not show up, and you have to write the uncomfortable post about it, with new rules and transparent reasoning, because that is what organizing actually looks like: not just running the event but protecting its future.

The parallel timelines

If I map my MVP journey against the platform’s, the parallels are hard to miss:

When I got the award, the platform was SQL Server with Azure as a sidecar. My expertise was multidimensional cubes and MDX. The community gathered under the PASS umbrella.

Five years in, the platform was Azure-first with Power BI as the analytical surface. My expertise had shifted to data modeling, deployment pipelines, and cloud administration. The community was regrouping after PASS shut down.

Now, at ten years, the platform is Fabric. My expertise is Lakehouses, semantic models, Notebooks, and AI-assisted development. The community runs on Data Saturdays, LinkedIn, and local user groups that feel smaller but more committed.

Three different jobs descriptions (same place) over one continuous award. The MVP title stayed the same on paper. Everything underneath it turned over at least twice.

What I take from it

I do not have a tidy lesson. I have an observation. The practitioners who stayed relevant through these shifts were not the ones who predicted the roadmap correctly. They were the ones who kept testing things with real data, publishing their mistakes, and showing up to events where they might learn something new. That is what I tried to do. Some years better than others.

Ten years. Same instinct, completely different platform.

Semantic Model Diff

…a change tracker for Power BI semantic models

Semantic models do not have version control. Not really. You can store .pbip files in Git, and Tabular Editor gives you a .bim file you can diff, but neither of those workflows answers the simplest question a team asks after an update cycle: what changed?

I do not mean “which file was touched.” I mean: which measures were modified, which columns were added, which relationships were removed, and what exactly is different in the DAX expression that someone edited last Tuesday. That is the question I kept running into, and the one I built this notebook to answer.

This is my third submission to the Fabric Semantic Link Developer Experience Challenge. The first two (a DAX unit test harness and a lakehouse validation notebook) test whether the model produces correct results. This one tests whether the model itself has changed, and tells you precisely how.

The problem

Every Power BI developer has been in this situation. A semantic model gets promoted from dev to production. Something breaks. A report shows unexpected numbers, or a relationship is missing, or a column that used to exist is gone. The first question is always: “What changed since last time?”

Without a structured answer, the debugging starts. Someone opens Tabular Editor and clicks through tables. Someone else opens the old .bim file in a text editor and does a manual comparison. If the model has 40 tables and 200 measures, that process takes long enough that you start questioning your career choices.

I wanted something I could run in a Fabric notebook that would snapshot the full model metadata, store it as JSON, and then produce a colour-coded diff between any two snapshots. No manual clicking. No text-editor heroics.

How it works

The notebook has two engines: a snapshot engine and a diff engine.

The snapshot engine connects to a semantic model via the Tabular Object Model (TOM), exposed through semantic-link-labs, and captures everything: tables, columns, measures, relationships, partitions, roles, and shared expressions. Each object gets recorded with all its properties. A table entry includes its name, description, hidden flag, data category, and type (regular table, calculated table, or calculation group). A measure entry includes the full DAX expression, format string, display folder, and whether it has a KPI attached.

The snapshot is saved as a plain JSON file in the attached Lakehouse’s Files area. File names encode the dataset name and UTC timestamp, so snapshots sort chronologically and you can keep as many as you need.

The diff engine takes any two snapshots and compares them object by object. Each object type has a natural key (for tables, the name; for columns, the table plus column name; for relationships, the relationship name). Objects are categorised as Added, Removed, or Modified. For modified objects, the engine reports exactly which properties changed and shows the old and new values side by side.

What gets captured

The snapshot covers seven object types:

  • Tables: name, description, hidden, data category, type
  • Columns: data type, format string, calculated column expression, display folder, sort-by column, summarize-by
  • Measures: DAX expression, format string, display folder, KPI presence
  • Relationships: from/to table and column, active flag, cross-filter direction, cardinality
  • Partitions: mode, source type, query or M expression
  • Roles: RLS row filter expressions per table
  • Shared expressions: M parameters and Power Query definitions

That last category matters more than you might expect. Shared expressions include connection strings and parameter values. If someone changed the data source path in a parameter, this diff catches it.

The diff output

The diff produces a summary table first: one row per object type, with counts of added, removed, and modified objects. In a clean promotion where nothing changed, all counts are zero and you get a green “No changes detected” message. That is the happy path.

When changes exist, the notebook renders a colour-coded DataFrame per object type. Green rows for additions, red for removals, amber for modifications. Modified rows show the specific property, old value, and new value on the same line. If a measure’s DAX expression changed, you see the full old expression and the full new expression right there.

The notebook also exports the full diff as a self-contained HTML file. I use that for sharing with team members who are not running notebooks. Drop it in a Teams channel or attach it to a pull request.

Using it

The workflow is four steps:

  1. Take a baseline snapshot before making changes
  2. Edit the model (in Power BI Desktop, Fabric Model View, Tabular Editor, wherever)
  3. Take a new snapshot
  4. Compare the two and review the diff

Configuration is two variables: DATASET (the name or GUID of the semantic model) and WORKSPACE (optional, defaults to the notebook’s workspace). Everything else runs from those values. Authentication is handled by Fabric, so there are no credentials to manage.

If you already have two snapshot files and just want to compare them, you can skip straight to step 4. The notebook auto-selects the two most recent snapshots if you do not specify file paths explicitly.

Why not just diff the .bim file?

A .bim file is JSON, so you can run a text diff on it. I have done that. The problem is that a .bim file is a single monolithic document. A text diff on a 15,000-line JSON file where someone reordered a few properties is not a useful diff. You get hundreds of lines of noise for a single meaningful change.

The structured diff in this notebook compares at the object level, not the text level. If a measure’s expression changed, you see that one measure and just the changed expression. Everything else is filtered out. The signal-to-noise ratio is massively better.

The .bim diff also does not work if you are comparing across environments. If you want to verify that a dev model matches production before promotion, you need to snapshot both and compare the metadata, not the files. This notebook handles that.

Where this goes next

Running this in a Fabric pipeline as a post-publish step is the obvious next move. Snapshot the model after every deployment, compare against the previous snapshot, and fail the pipeline (or post a Teams notification) if objects were unexpectedly removed.

Cross-environment comparison is another use case I am already running: snapshot Dev and Prod on the same day, diff them, and confirm that what was promoted is exactly what was intended.

A more ambitious extension would be inline DAX expression diffs using a line-diff algorithm instead of full expression replacement. The structure supports it. I have not built that part yet.

The notebook is submitted to the Fabric Notebook Gallery as part of the Semantic Link Developer Experience Challenge. If you have ever spent an afternoon trying to figure out what changed in a semantic model, this might save you the detective work.

Source code available here: vestergaardj/Semantic-Link-TestHarness: Semantic Model Test Harness: Unit & Regression Testing for DAX (using Semantic Link Labs)

Validating DAX Against Your Lakehouse with Semantic Link

A semantic model is a promise. It promises that the numbers in your reports match the data in your lakehouse. But after enough model changes, renamed columns, new relationships, and tweaked measures, that promise gets harder to verify. I wanted a way to check it programmatically.

This is my second submission to the Fabric Semantic Link Developer Experience Challenge. The first was a DAX unit test harness that compares measures against hardcoded expected values. That works well for known business rules, but it has a limitation: someone has to decide and maintain what the “right” answer is. For a model with hundreds of measures across dozens of filter contexts, that does not scale.

So I built something different. Instead of hardcoding expected values, I use the Lakehouse as the ground truth.

The idea

If your semantic model sits on top of a Fabric Lakehouse, then both the DAX layer and the SQL layer should agree on the same numbers. A COUNTROWS('fact_ticket_metrics') in DAX should return the same count as SELECT COUNT(*) FROM Gold.fact_ticket_metrics in Spark SQL. If they diverge, something changed in the model that needs attention.

The notebook takes pairs of queries: one DAX, one SQL. It executes both, normalizes the results into comparable DataFrames, and reports pass or fail. The Lakehouse is the single source of truth.

How it works

Each test case is a dictionary with a description, a DAX query, and a SQL query. Optionally, you can specify sort columns, a floating-point tolerance, or a column mapping if the names differ between the two result sets.

test_cases = [
    {
        "description": "Row count tickets",
        "dax_query": """
            EVALUATE
            ROW("RowCount", COUNTROWS('fact_ticket_metrics'))
        """,
        "sql_query": """
            SELECT COUNT(*) AS RowCount
            FROM Gold.fact_ticket_metrics
        """,
    },
    {
        "description": "Total # Incidents",
        "dax_query": """
            EVALUATE
            ROW("fact_ticket_metrics", [Incidents])
        """,
        "sql_query": """
            SELECT COUNT(id) AS Incidents
            FROM Gold.fact_ticket_metrics
        """,
        "tolerance": 0.01,
    },
]

The harness loops through each test case, executes the DAX query via sempy_labs.evaluate_dax_impersonation, executes the SQL query via Spark, and then compares the two result DataFrames.

Column normalization

DAX returns column names in the format 'Table'[Column]. SQL returns plain column names. If you want to compare them, those names need to match.

A small regex function strips DAX table qualifiers: 'Sales'[Amount] becomes Amount, and [Amount] also becomes Amount. After normalization, the harness aligns both DataFrames on their common columns, sorted alphabetically. Any extra columns on either side get flagged as warnings but do not block the comparison.

If the normalized names still do not match (say the DAX column is RowCount but the SQL column is row_count), you can pass a column_mapping dictionary to handle the translation explicitly.

Floating-point tolerance

Not every number comparison should demand exact equality. Aggregations in DAX and Spark can produce slightly different floating-point results depending on processing order and precision. The harness uses numpy.isclose with a configurable relative tolerance (default: 0.0001) for numeric columns. String columns are compared as exact matches.

When a numeric mismatch exceeds the tolerance, the harness reports the specific column, row, DAX value, and SQL value. It caps the output at five mismatches per column to keep the report readable when something is broadly wrong.

The comparison engine

The comparison works at the DataFrame level, not just scalar values. This matters because many useful validation queries return multiple rows: a count per category, a sum per month, a distinct count per workspace. Scalar-only testing misses structural issues like missing rows or extra groupings.

The engine does three things in sequence:

  1. Aligns both DataFrames on their common columns and sorts them consistently
  2. Compares numeric columns with tolerance, string columns with exact match
  3. Collects mismatches into a diff DataFrame for inspection

A shape mismatch (different number of rows or columns) is an immediate failure. You get the exact dimensions from both sides so you know whether the issue is missing data or a query that groups differently.

What I tested against

I ran this against a Zendesk reporting model that sits on a Gold-layer lakehouse. The model has ticket metrics, incident counts, and support analytics. The test cases validated that the semantic model’s row counts and measure aggregations matched the underlying SQL tables.

This is the kind of model where schema drift is common. New ticket categories get added, fields get renamed upstream, and the Gold layer evolves. Having an automated check that the DAX layer still reflects reality saves the awkward moment when someone asks why the dashboard numbers do not match the data export.

How this differs from unit testing

My other submission, the Semantic Model Test Harness, validates DAX against hardcoded expected values. That is unit testing: does this specific measure, with this specific filter, return this specific number?

This notebook is closer to integration testing. It validates that the semantic model agrees with its source data. The two approaches complement each other:

  • Unit tests catch business logic errors (a measure formula was changed incorrectly)
  • Lakehouse comparison tests catch data layer drift (the model no longer reflects what is in the tables)

Running both gives you confidence from two different angles.

Where this goes next

The test case format supports multi-row comparisons, so extending this to validate entire dimension tables (not just aggregated measures) is straightforward. I can also see connecting this to Fabric pipeline orchestration, running the comparison notebook as a post-refresh step to detect drift immediately after data lands.

Another natural extension: generating test cases automatically by introspecting the semantic model’s measures and matching them to lakehouse tables. Semantic Link Labs has functions for listing model metadata that could feed into a test case generator. I have not built that yet, but the structure is there.

The notebook is submitted to the Fabric Notebook Gallery as part of the Semantic Link Developer Experience Challenge. If you are running semantic models on top of a Lakehouse and have ever wondered whether they still agree, this might save you some manual checking.