Microsoft Certified: Fabric Data Engineer Associate

I have renewed my Microsoft Certified: Fabric Data Engineer Associate certification for another year, and since this one actually required me to sit down and think rather than just click through some formalities, I thought it worth a quick note here.

The renewal is free. It is an online assessment on Microsoft Learn, not a full proctored exam. That might sound easier than the original DP-700, and in some ways it is. But the questions are updated, the skill areas evolve alongside the platform, and Microsoft has made a point of keeping the content aligned with what Fabric actually looks like today. The skills were last updated on April 20, 2026, with minor changes across most sections.


What the certification covers

The exam breaks down into three equally weighted areas, each around 30 to 35 percent of the total:

Implement and manage an analytics solution covers workspace configuration, lifecycle management with deployment pipelines and version control, security and governance at row, column, object, and folder level, and orchestration choices between Dataflows Gen2, pipelines, and notebooks.

Ingest and transform data is where most of the practical depth lives. Full and incremental load patterns, OneLake shortcuts and mirroring, PySpark, SQL, and KQL transformations, handling duplicates and late-arriving data, and streaming via Eventstreams and Spark structured streaming.

Monitor and optimize an analytics solution covers monitoring ingestion and transformation, resolving errors across pipelines, Dataflows, notebooks, Eventhouses, and Eventstreams, and performance optimization for Lakehouse tables, pipelines, data warehouses, and Spark.


Why bother renewing

Fabric moves fast. I have been working with Fabric in various capacities for a while now, and the platform I certified against originally does not look identical to what I use today. Real-Time Intelligence has matured, OneLake security has grown considerably, and the orchestration story has gotten richer. Going through the renewal assessment is a reasonable way to check whether there are gaps between what I knew and what the platform has become.

It is not a substitute for hands-on work. But it is a decent forcing function to at least read the change log and spend an hour or two with areas that have moved the most.

If you are already certified and have not renewed yet, the renewal assessment is here and the updated study guide for DP-700 is worth a read before you go in.

Semantic Sonar

Canary monitoring for Power BI semantic models

Quarterly sales season always brings the same ritual: a business user opens Power BI, clicks into the sales dashboard, and waits for the numbers to load. It has been a while since anyone looked at this particular report. The numbers look off. Someone hits refresh. The spinner turns, but the data does not budge. The last refresh was weeks ago.

That is when the emails start. “Can you check why the sales model hasn’t updated?” The support thread grows. IT investigates. The answer is usually not in the data. The refresh failed at 3 AM. Or the data source changed its connection string last Tuesday. Or someone revoked the service principal without telling anyone.

By the time the root cause is found, the report has been open in meetings for hours, and the numbers have been wrong for days.

I built Semantic Sonar because I wanted that conversation to happen at 3:01 AM when the refresh failed, not at 10:30 AM when the quarterly review is already underway. The idea is a canary: a lightweight DAX query fired at each semantic model on a schedule. If it comes back clean, the model is reachable and queryable. If it does not, something is wrong right now, not three days from now.

Registering the first model

The first page most people land on after setting up the system is the workspace browser. The alternative is typing workspace GUIDs and dataset IDs by hand, which I did exactly once before deciding to build the browser.

You pick a tenant from the dropdown, the browser calls the Power BI REST API with the tenant’s service principal, and it returns a list of workspaces. Expand a workspace and you see every semantic model in it with its display name. Click Register and a form pre-fills with the workspace ID, dataset ID, and a default DAX query:

EVALUATE ROW("Ping", 1)

That default is intentionally trivial. You want the canary to be cheap. Its job is not to test the data. Its job is to confirm that the model is reachable and that the service principal can still authenticate. A single-row EVALUATE ROW does that in under a second on any model that is working.

For most models I change the query to something slightly more meaningful: a measure that actually touches the fact table, so a permission problem on the underlying source would surface. But the principle stays the same: keep it light, keep it fast, run it often.

The interval sits at 60 minutes by default. For critical models I set it lower, 15 or 30 minutes. The minimum the system allows is 1 minute, though for most models that is more noise than signal.

The dashboard: what you see first

Once a few models are registered and the scheduler has had time to run, the dashboard starts telling its story.

Four numbers across the top: total models, active, failing, disabled. In a healthy fleet they read something like 24, 22, 1, 1. The single failing model is the one you care about. The disabled one was auto-disabled after 30 consecutive failures, which usually means it has been broken for a long time and nobody has got around to it.

Below the stat cards, there are two lists. The first is models at risk: anything with 10 or more consecutive failures. These are on a countdown. At 30 consecutive failures the system disables them automatically. The at-risk list is early warning. You have a window to investigate before the model drops off monitoring entirely.

The second list is recent failures: the last handful of failed checks across all models, with timestamps and the raw error message. That error message is usually enough to know immediately whether something needs attention now:

Power BI API returned 401: {"error":{"code":"TokenExpired",...}}

versus

Power BI API returned 404: {"error":{"code":"ItemNotFound",...}}

TokenExpired means a credential needs rotating. ItemNotFound means someone deleted or moved the model. Different problems, different urgency, different owner to call.

The dashboard auto-refreshes every 60 seconds. I leave it open on a side monitor. When the amber tile count goes up, I look.

Drilling into a model

From the dashboard or the models list, clicking through to a single model shows the full result history.

The top section repeats the key facts: current status badge (Active, Failing, Disabled), consecutive failure count, last run timestamp, last run success or failure. Below that is a latency chart. This is the view that catches the subtler problems.

A model that is technically succeeding but whose latency has climbed from 200ms to 4,000ms over a week is not a healthy model. Something upstream changed. Maybe a gateway is under new load. Maybe a DirectQuery source started doing full scans. The trend is visible here because every result gets recorded with its latency in milliseconds.

The full result history below the chart shows each check: timestamp, success or failure, latency, row count, and the first row returned as JSON. That last one is useful. If your canary query is:

EVALUATE ROW("SalesTotal", [Sum of Sales Value])

then the first row shows the actual value that came back. A value of 0 that held steady for months and then jumped to null is interesting. It does not mean the model is broken in the way the system defines it, but it means something changed.

The models list: the fleet view

The models list page is the operational view. It has search, filter by tenant, filter by status (all/active/disabled/failing), and filter by tag. Tags are free-form strings you can add to any model, so you can group by environment (production, staging), by team, or by data domain.

Sort by health score ascending and the most degraded models float to the top. The health score is a composite across consecutive failure count, recent latency relative to the model’s own baseline, and uptime across the rolling windows. It is not a formal SLA calculation; it is a signal for where to look first.

The card for each model shows the health grade (A through F), the last run time, the consecutive failure count, and a quick-run button. That last one is useful during an investigation: rather than waiting for the next scheduled run, you can trigger a check immediately and watch the result appear in the history a few seconds later.

Uptime: putting numbers on it

The Uptime page is for the conversation with stakeholders who want to know how reliable their models are in terms they can put in a report.

Each model gets three rolling windows: last 24 hours, last 7 days, last 30 days. Each window shows total checks, successes, and uptime percentage with a colour-coded bar:

  • 99.5% and above: green
  • 95 to 99.5%: yellow
  • 90 to 95%: orange
  • Below 90%: red

Across the top of the table, the aggregate uptime across all models for each window gives a fleet-level headline. “Fleet availability last 7 days: 97.4%” is a concrete number to bring to a monthly review.

The same page also shows p95 latency trends per model. Uptime tells you whether the checks are succeeding. Latency tells you whether the models that are succeeding are doing so at acceptable speed.

Radar: the visual overview

The radar page is the one I use when I want to scan the whole fleet at a glance. It renders each model as a dot on a canvas, positioned by p95 latency on one axis and health score on the other, with dot size reflecting which models have the most weight (based on how frequently they run). Colour is the health grade.

In a healthy deployment most dots cluster in the top-left: low latency, high health score, green or lime. The outliers stand out immediately. A large red dot in the bottom-right means a high-frequency model with degraded health and elevated latency. That is the one to investigate first.

Hovering a dot shows a tooltip with model name, tenant, score, grade, and current p95 in milliseconds.

Maintenance windows: the expected failures

One thing that caused false alarms early on was scheduled maintenance. The underlying SQL Server gets taken offline every Saturday between 2 and 4 AM. The canary runs every 30 minutes. The result: a string of failures, the at-risk counter climbing, an alert firing at 2:15 AM for something everyone already knew was happening.

Maintenance windows solve that. Each window has a start time, end time, and day-of-week pattern. Two flags control what happens during the window. suppressAlerts keeps running checks and recording results but does not fire webhooks. skipCanary stops running checks entirely, which is what you want when the data source is deliberately taken offline.

The scheduler checks for an active window before it enqueues a job. If skipCanary is set, it advances the next run time and skips the queue. The model’s consecutive failure counter does not increment. When the window ends, monitoring resumes from a clean state.

Webhooks: getting alerted

The dashboard is useful when you are looking at it. For everything else there are webhooks.

A webhook fires on three events: a model transitioning from healthy to failing, a model recovering after failures, and a model being auto-disabled. You configure one or more webhook URLs on a model, each with an optional shared secret. The payload is a JSON object with the event type, model name, tenant, and a details string.

Payloads are signed with HMAC-SHA256 and the signature goes into an X-Signature header, so the receiving system can verify the sender before acting on the message. Delivery is best-effort: a failed webhook delivery is logged in Application Insights but does not affect the canary result or retry behaviour.

In practice I point webhooks at a Power Automate flow that posts to Teams. A failure notification lands in the right channel within a minute of the first failed check.

The multi-tenant model

The system was built from the start to monitor multiple customers’ models from a single deployment. That shapes the credential design more than anything else.

Each customer is a tenant record in Cosmos DB: their display name, their Entra ID (the GUID of their Azure AD tenant), and the client ID of an app registration created in their tenant. The client secret for that app registration goes into Key Vault with the name tenant-{tenantId}-client-secret.

When the worker runs a check, it reads the right secret, acquires an OAuth 2.0 token scoped to the customer’s Entra tenant, and calls the Power BI executeQueries endpoint on their behalf using their own service principal. The token is cached in memory with a 5-minute buffer before expiry.

var clientSecret = await _keyVault.GetTenantClientSecretAsync(tenant.EntraId, ct);
var credential = new ClientSecretCredential(tenant.EntraId, tenant.ClientId, clientSecret);

All other service-to-service calls (the Function App talking to Cosmos DB, Key Vault, and the Storage Queue) use a system-assigned managed identity. No connection strings in configuration. The Power BI API is the one exception: it requires a credential that the customer’s tenant has consented to, and a managed identity from the hosting tenant cannot satisfy that requirement. So that leg stays on client credentials.

One edge case: live connections and SSAS

Most models work fine with the executeQueries endpoint. The ones that do not are models live-connected to Analysis Services or on-premises SSAS through a gateway, where the gateway blocks the query execution endpoint but still allows the metadata API.

For those, setting queryMode to rest on the model switches the worker to a lighter ping that calls the dataset metadata endpoint instead. It does not execute a DAX query, but it confirms that the service principal can reach the model and that the Power BI API returns a 200. If the gateway is down, that returns a 400 or 503. The rest of the pipeline, the result recording, the consecutive failure counter, the dashboard, is identical.

Audit log

The audit page records configuration changes: who added a model, who changed an interval, who disabled a tenant. It is the answer to “what changed right before the failures started.” Filter by tenant or by entity ID. The last 100 entries load by default.

I added this after the third time someone changed a model’s DAX query to something that always returns zero rows, and nobody could remember who had done it or when.

Deployment

The infrastructure is Bicep, deployed via the Azure Developer CLI:

azd auth login
azd up

That provisions everything: the Static Web App, the Function App, Cosmos DB, Key Vault, Storage, and Application Insights. The Bicep modules are in infra/modules/, and the Azure Developer CLI service definitions in azure.yaml handle the build and deploy sequence for both the .NET Functions project and the Next.js frontend.

After deployment, you add the first tenant and model through the UI, store the client secret in Key Vault, and wait for the first scheduled run. The README has the full post-deployment checklist.

What it does not do

Semantic Sonar tells you whether a model is reachable and queryable. It does not tell you whether the numbers coming out of it are correct. A model that returns [Sum of Sales Value] = 0 because the ETL truncated the fact table will pass every canary check.

The “failing” status also does not distinguish between the model being down, the underlying data source being down, and the service principal’s permissions having expired. The error message in the result history usually makes it clear, but the system does not attempt to classify the cause.

The 90-day TTL on results caps the uptime windows at 30 days. If you want longer retention you can change the TTL in CanaryResult.cs and the Cosmos DB container configuration, but the uptime calculation logic would need updating to match.

Getting the code

The repository is at github.com/vestergaardj/semantic-sonar. The README covers prerequisites, local development, and the full post-deployment walkthrough.

Developer Experience Challenge – DAX Dependency Graph

The Fabric Semantic Link Developer Experience Challenge gave me a concrete reason to build something I had been meaning to build for a while. The result is a Fabric notebook that extracts every DAX measure from a Power BI semantic model, maps measure-to-measure dependencies into a directed graph, and produces a ranked complexity audit with risk ratings. This post is about what it does, how it works, and why the approach is useful beyond the challenge context.

The problem it solves

If you have worked on a Power BI semantic model that has grown organically over a few years, you already know the problem. Measures reference other measures. Those measures reference other measures. Nobody wrote it down. The person who built the original [Gross Margin Adj YTD] has since moved on, and the name was self-explanatory at the time.

Power BI Desktop shows you a measure’s DAX expression when you click on it. It does not show you which other measures that expression depends on, how deep the dependency chain goes, or whether any measure sits in a circular reference loop that the engine has been quietly working around. Tabular Editor helps, but it still requires manual navigation. There is no built-in view that answers “what are the ten most complex measures in this model, and which ones does everything else depend on?”

That is what this notebook answers.

Getting the measures out

The notebook uses sempy.fabric.list_measures() from Semantic Link to pull every measure from the target model in a single call. It returns a pandas DataFrame with measure name, parent table, DAX expression, visibility, and description per row.

measures_df = fabric.list_measures(dataset=DATASET_NAME, workspace=WORKSPACE_NAME)

Under the hood, Semantic Link connects via the Tabular Object Model (TOM) over the XMLA endpoint. Fabric handles authentication from the notebook’s identity. Two config values is all the setup needed:

WORKSPACE_NAME = None                      # None = current workspace
DATASET_NAME   = "YourSemanticModelName"

Then Run All.

Parsing measure references: three passes

The interesting part is working out which measures each expression actually references. DAX uses square bracket notation for both measures and columns: [Total Revenue] is a measure reference, Sales[Amount] is a column reference. The parser has to distinguish them correctly.

It does this in three passes:

  1. Strip string literals ("...") to avoid false positives. A FORMAT call like FORMAT([Date], "Total Revenue") would otherwise incorrectly register a dependency on [Total Revenue].
  2. Strip single-line and multi-line comments (// and /* */).
  3. Extract all [Name] patterns where the opening bracket is not preceded by a word character, digit, or apostrophe. That lookbehind excludes table-qualified references like Sales[Amount] and 'My Table'[Column].

The extracted names are then cross-referenced against the full set of known measure names. Anything that is not a measure name is discarded.

pattern = r"(?<![a-zA-Z0-9_'])\[([^\]]+)\]"
matches = re.findall(pattern, cleaned)
return list({m for m in matches if m in measure_names})

This correctly handles the common edge cases in real models. The one known limitation: measures referenced via SELECTEDMEASURE() or through a disconnected table SWITCH pattern cannot be resolved statically. If your model uses those patterns heavily, some dependencies will be missing from the graph.

Building the graph

Once the parser has run on every expression, the dependencies go into a NetworkX directed graph. Each measure is a node. An edge A -> B means “A’s DAX expression references measure B” — A depends on B.

The graph direction is important. It lets the tool compute:

  • In-degree (fan-in): how many measures depend on this one. High fan-in means “hub” measure. Breaking it cascades everywhere.
  • Out-degree (fan-out): how many measures this one calls. High fan-out means complex composition.
  • Longest path from any node: the transitive dependency depth.
  • Cycles: circular reference chains.

From those two properties alone, the next three analyses fall out naturally.

Five analyses

Dead measures. In-degree of zero means no other measure references this one. It might be a top-level report measure used directly in a visual, or it might be genuinely unused. The notebook flags all of them; cross-referencing with report usage is a follow-up step.

Root measures. Out-degree of zero means no dependencies on other measures. These are the foundation: the SUM(Sales[Amount]) base calculations that everything else builds on. Errors in root measures propagate silently through every measure above them.

Circular references. The notebook runs Johnson’s algorithm via nx.simple_cycles() to find every elementary cycle in the graph. In a well-designed model the result is: “No circular dependencies detected.” When it is not, the full chain is printed — A -> B -> C -> A — so you know exactly what to untangle.

Complexity scoring. Each measure gets a weighted composite score across six dimensions:

DimensionWeightRationale
CALCULATE / CALCULATETABLE count3Context transitions are the primary source of subtle DAX bugs
Max parenthesis nesting depth1Readability proxy
Branching (IF / SWITCH)2Code path count
Filter functions (FILTER, ALL, ALLEXCEPT, etc.)2Filter-context manipulation
Dependency depth (longest downstream path)2Transitive error amplification
Fan-out (direct measure references)1Composition width

The weights are the part most open to debate. I gave CALCULATE the highest weight because context-transition confusion is, in my experience, where DAX models accumulate the most invisible risk. The depth and branching weights reflect that those properties make a measure harder to verify than to write.

Visual dependency DAG. The notebook renders the graph with matplotlib. Node color encodes complexity score on a green-to-red scale. Node size encodes in-degree, so hub measures are physically larger. An optional pyvis interactive version renders inline in the notebook via an iframe: zoomable, draggable, with hover tooltips showing measure name, table, and score.

The audit report

The final step consolidates everything into a single DataFrame sorted by risk rating, then by descending complexity score within each tier.

Risk logic:

  • Critical: participates in any circular reference
  • High: complexity score ≥ 20
  • Medium: score between 10 and 19
  • Low: everything else

The summary banner above the table looks like this:

============================================================
  DAX DEPENDENCY GRAPH - AUDIT SUMMARY
============================================================
  Total measures:            147
  Dependency edges:          312
  Unreferenced measures:     38
  Root measures (no deps):   22
  Circular references:       0
  High complexity (>=20):    11
  Medium complexity (10-19): 29
============================================================

That is the starting point for any refactoring conversation. Eleven measures scoring High is a concrete prioritisation signal: start there, not with the 38 unreferenced ones.

Limitations worth knowing

The parser only resolves measure-to-measure dependencies. Column-level lineage is out of scope. Calculation groups are not modeled as nodes, so models that use them heavily will have gaps. Cross-model references (composite model / DirectQuery to external datasets) are not in scope either.

The “unreferenced” flag does not mean unused. It means not referenced by other measures. A measure used directly in 15 report visuals will still show as unreferenced in this graph, because the tool has no report-level visibility. That cross-reference is worth doing separately with sempy.fabric.list_reports() if you are planning to delete anything.

Getting the notebook

The notebook is on GitHub: github.com/vestergaardj/DDG-DAX-Dependency-Graph. Upload it to any Fabric workspace, set the two configuration values in the Configuration cell, and run all. Everything else is automatic.

It requires semantic-link-sempy, networkx, and matplotlib, all of which come pre-installed in Fabric. pyvis is optional; the static graph still renders without it.

Are you still the air traffic controller?

In February 2025 I wrote about building an event-driven ETL system in Microsoft Fabric. The metaphor was air traffic control: notebooks as flights, Azure Service Bus as the control tower, the Bronze/Silver/Gold medallion layers as the runway sequence. The whole system existed because Fabric has core-based execution limits that throttle how many Spark jobs run simultaneously on a given capacity SKU.

The post was about working around a constraint. You could not just fire all your notebooks at once. You needed something to manage the queue.

More than a year on, it is worth being honest about what held up and what has changed.

The original architecture, briefly

Three components:

Azure Service Bus acted as the message queue. When a source system finished loading raw data, it dropped a message onto the bus. Each message represented one flight waiting for clearance.

A capacity monitor notebook ran on a short schedule. It checked how many notebooks were currently executing and compared that count against the available core capacity. If capacity was available, it pulled the next message from the queue and triggered the appropriate notebook via the Fabric REST API.

The processing notebooks were standard Bronze, Silver, and Gold Spark notebooks. They ran as normal Fabric notebooks with no awareness of the orchestration layer above them. On completion, they acknowledged the Service Bus message.

The deliberate design choice was to keep the notebooks clean and put the complexity in the orchestration layer. A notebook should not need to know whether it is being called by a scheduled job, a pipeline, or a service bus monitor. That separation held up well.

What has changed in Fabric

Two relevant changes since February 2025:

Native job queueing

Fabric now queues Spark jobs automatically when capacity is exhausted rather than rejecting them outright. Jobs queue in FIFO order and wait up to 24 hours before expiring. The platform starts them automatically as capacity becomes available.

There is a hard constraint that limits how far this goes: the queue depth is bounded by the SKU’s CU allocation. It is not unlimited. A sudden burst of 100+ notebooks would exceed both the concurrent execution limit and the queue depth, and the excess jobs would be rejected rather than queued — the same failure mode as before native queueing existed.

So native FIFO queueing helps if your workload arrives gradually. It does not change the original problem if your trigger pattern involves large simultaneous batches. The Service Bus buffer sits outside Fabric and has no queue depth constraint. That distinction is why the architecture is still relevant.

Job-level bursting controls

Fabric capacities support bursting at up to 3x the nominal CU allocation. You can now disable bursting for specific Spark jobs, giving finer-grained control over which jobs are allowed to consume burst headroom. Useful for ring-fencing critical workloads. This is an additive improvement to the platform regardless of which orchestration approach you use.

What held up

The decoupled architecture held up. Keeping orchestration logic out of the processing notebooks made them easier to test, modify, and redeploy independently. A notebook that does not know how it was triggered is easier to reason about than one that contains scheduling and queueing logic alongside its data transformation. Nothing about that changed.

Azure Service Bus held up as a reliable messaging backbone. At-least-once delivery, dead-letter queues, message peek-lock, and configurable time-to-live are production-grade features. There were no reliability issues with the messaging layer over the period.

The Bronze, Silver, Gold medallion structure held up. That is sound data architecture independent of the orchestration tool above it.

What I would do differently today

Not much, given the workload pattern. The original system was designed for burst scenarios, and the burst scenario has not changed. The native queue depth limit tied to CU allocation means there is still no Fabric-native replacement for an external buffer that absorbs an unbounded number of incoming messages and feeds them into Fabric at a controlled rate.

The one thing I would reconsider is the capacity monitor notebook’s polling loop. It runs on a short schedule and checks active job counts before pulling the next message. That works, but it adds latency and a scheduling dependency. Whether the Fabric REST API now exposes enough observability to build a leaner version of that loop is worth investigating — but that is an implementation detail, not a reason to replace the architecture.

The honest production reality

The system is still running. A working production system with understood failure modes and people who know how to debug it has a high replacement threshold. The architecture from February 2025 has not needed replacing because the problem it solved has not stopped existing.

The question I get asked: is there a more native way to do this now? The answer is no, not for the burst-buffering problem specifically. Fabric pipelines handle sequential orchestration well. Native FIFO queueing handles gradual workloads within its depth limit. Neither absorbs an unbounded burst and controls submission into a capacity-constrained runtime. Service Bus still does that job, and it still does it well.

The air traffic controller is still in the tower. The runway is a little wider than it was, but the traffic has grown too.

W-Order Just Dropped in Microsoft Fabric

I was poking around in the Lakehouse settings after the latest Fabric update rolled out last night, and noticed something I have not seen documented anywhere yet.

Buried in the optimization section of the Lakehouse table properties, there is a new toggle: W-Order.

If you have been following along with V-Order since Fabric went GA, you know it already does a solid job of optimizing Delta tables for read performance. W-Order is apparently the next generation. The acronym, according to the tooltip, stands for Wavelet-Optimized Recursive Delta Encoding Rewrite. It claims to apply wavelet decomposition to the Parquet column chunks and recursively re-encode them into what the UI calls “spectral micro-partitions” based on historical query access patterns.

I have no idea what half of that means. But I had to test it. Obviously.


Where to Find It

Open your Lakehouse in the Fabric portal. Navigate to Table properties on any Delta table, scroll down past the standard V-Order settings, and you should see the new section sitting right below it.

Flip the toggle, confirm the dialog, and the table enters what the UI calls a “spectral rewrite” phase. On my test table (around 48 million rows, partitioned by month), this took about four minutes. A small progress indicator shows up next to the table name in the explorer while it runs.

Spectral rewrite in progress, indicated by the spinner next to the table name

The Numbers

I ran the same aggregation query on the table before and after enabling W-Order. Same capacity, same time of day, same query, three runs each.

RunBefore (seconds)After (seconds)
114.30.34
213.80.33
314.10.34
Same query, same capacity, same data. Three consecutive runs.

That is roughly a 42x improvement. On a simple GROUP BY with a SUM. I nearly spilled my coffee.

0.34 seconds on 48 million rows. I had to run it again to believe it.

A Few Things to Note

  • Always check your runtime version first, the feature requires the April 2026 update
  • Premium or F64+ capacity is required. The toggle simply does not show up on lower SKUs
  • Reoptimization consumes CU credits, so keep an eye on your capacity metrics while the spectral rewrite runs
  • It only appears on Delta tables, not on shortcuts or mirrored tables
  • Latency on tables with heavy concurrent write loads is unknown, so proceed with some caution there

Microsoft has not published any documentation for this yet as far as I can tell. The feature might still be rolling out, so if you do not see the toggle, give it a day or two. The latest runtime update notes are here.

Go check your Lakehouse settings. And if something about this whole thing feels off, maybe take a second look at today’s date before you reorganize your entire data estate.