Exploring Fabric Ontology

Note: The Fabric Ontology is currently in preview as part of the Fabric IQ workload. Features and behaviour may change before general availability.

I have been spending a little time with the Microsoft Fabric data agent documentation lately, and one pattern keeps showing up, and it is not just in the official guidance but in community posts from people who have actually tried to deploy these things: the demo runs beautifully. The AI answers questions in plain English, leadership gets excited, the pilot gets approved. Then it hits production. Real users send real questions. The answers start drifting. Numbers that should match do not. The same question returns different results on different days. Trust evaporates faster than it was built.

And almost every time, the root cause is the same thing: the semantic foundation was not solid enough before anyone pointed an agent at it.

That is exactly the problem the Fabric Ontology is designed to address. It is the piece I think most teams will underestimate right up until the moment they need it.


Why the Data Agent Gets It Wrong

Generative AI is genuinely good at working with language and meaning. What it cannot do is fill in documentation that was never written.

Most enterprise databases were built for systems, not for consumption. Column names follow technical conventions an engineer settled on years ago. Business logic lives in a stored procedure nobody has touched since SQL Server 2014. Which customer table is the authoritative customer table? Documented nowhere. The abbreviation cust_rev_ytd_adj was obvious to the person who named it. To everyone else, including an AI agent, it is a puzzle.

When you connect an agent to that data and ask it to answer business questions, you are asking it to decode a language it was never given a dictionary for. It is not going to find meaning that was never documented. Someone has to build that foundation deliberately, before the agent gets anywhere near it.

This is not a new problem. It is the same problem that made undocumented semantic models painful for analysts, made onboarding new BI developers slow, and made “what does ‘active customer’ mean?” a recurring meeting agenda item. The AI just made it impossible to paper over.


What the Fabric Ontology Actually Is

The Fabric Ontology operates above the table and column level, at the concept level, the level where business people actually think and where agreements actually need to live.

Three building blocks:

Entity types are the real-world objects your business runs on. Customer, Order, Product, Shipment, Store. Defined once, with a stable name, description, and identifiers. Not four slightly different customer tables with different primary keys depending on which source system populated them first.

Properties are named, typed facts about an entity. Instead of a column called cust_rev_ytd_adj, you publish a Customer property called Adjusted Year-to-Date Revenue with a declared unit, a data type, and a binding to the underlying source column. Something a new analyst can understand without asking someone who remembers the original intent. Something an AI agent can reason about without guessing.

Relationships are explicit, directional, typed links between entities with cardinality rules. Customer places Order. Order contains Product. Shipment originates from Plant. Made reusable and visible, rather than buried in join logic spread across three different pipelines and a Power BI measure that no one wants to open.

Those concept definitions then bind to your actual data in OneLake: lakehouse tables, Eventhouse streams, Power BI semantic models. The data bindings handle schema drift, enforce data quality checks, and track provenance at the concept layer.

The result: a shared vocabulary that both people and AI agents can reason over. When an agent is grounded in a well-defined ontology, it is not reverse-engineering meaning from raw tables. It is working from a context that someone owns and maintains.


The Ontology Graph: Relationships as Queryable Data

The Fabric Ontology also builds an ontology graph from your data bindings and relationship definitions: a queryable instance graph where entity instances are nodes and relationships are edges, each carrying metadata and data source lineage, refreshed on a schedule.

For anyone who has spent time making implicit relationships explicit and queryable, this is worth understanding. Context that previously only existed as join logic, tracing which customers are tied to which orders and which products trace back to which suppliers, becomes something you can traverse, analyze, and govern. Path finding, centrality analysis, community detection: graph algorithms applied to your actual business data.

On top of that sits a Natural Language to Ontology (NL2Ontology) query layer that converts plain-language questions into structured queries across your bound sources, routing automatically to GQL for graph queries or KQL for Eventhouse. Not a best-effort guess at what a column might mean. Consistent answers that follow the definitions you published in your ontology.


Three Things That Actually Matter Before You Build the Agent

I have not shipped a production data agent grounded in a Fabric Ontology end-to-end yet. The feature is still in preview and I am still working through it. But the guidance is consistent enough across documentation and early community experience that I think these three things are worth naming before you start.

Build the semantic foundation first

This is the step that gets rushed. The agent is only as reliable as the context it has to work with. If your semantic model has undocumented measures, ambiguous column names, and definitions that three different teams would answer three different ways, an ontology built on top of that inherits all of it.

Before connecting an agent to your data:

  • Audit your semantic model. Are measure names self-explanatory? Are the terms your business uses defined anywhere?
  • Generate a Fabric Ontology from your semantic model as a starting point. Fabric can auto-generate one to give you something concrete to refine, rather than starting from a blank canvas.
  • Write descriptions for the columns and measures that currently only make sense to the person who created them.
  • Resolve the definitions that lack consensus before rollout, not after. “What counts as an active customer?” is not a technical question. It is a business alignment question. It needs an answer before the agent encounters it at 9am from a business stakeholder.

Keep each agent’s scope narrow

The temptation is to build one agent that answers everything. It almost always underperforms. The more data an agent has to reason over per question, the harder it is for it to return consistent answers.

A sales agent. An inventory agent. A finance agent. Each one is easier to configure, easier to test, and easier for the people who rely on it to trust, because the scope is legible and the owner is clear.

Start with one domain. The one where trust matters most and the semantic definitions are clearest. Do it properly. Let that one earn credibility before expanding.

Write the instructions like you are briefing a smart new colleague

Data agents are probabilistic: they use statistical reasoning to determine the most likely answer. Business users expect deterministic behavior, meaning the same question should return the same answer every time. Detailed agent instructions are the primary lever for closing that gap.

Think of it as the standing brief you would write for a new analyst on their first day: here is what matters, here is how we define things, here is what belongs out of scope, and here is what to do when a question is ambiguous.

For your most critical business questions, Fabric data agents support sample questions with pre-defined SQL, DAX, or KQL behind them, removing the probabilistic element entirely for those specific scenarios. Use it. Treat the instructions as a living document and update them as you learn how people in your organization actually phrase questions.


Where I Am On This

The hard part of building reliable AI over enterprise data is not the model. It is the semantic gap between raw data structures and the meaning business users expect the model to already know. The Fabric Ontology looks like the most direct thing Microsoft has shipped to address that gap at the platform level. That is what makes it worth paying attention to, even while it is in preview.

I am still early in exploring this and plan to dig further as it moves toward GA. If you have already started building with it, whether you found a workflow that clicked, hit a wall, or worked around something unexpected, I would genuinely like to hear about it in the comments.

Materialized Lake Views: Now Generally Available

I’m excited to share that materialized lake views, announced at the FabCon Atlanta conference, are now generally available in Microsoft Fabric.

Reference:
Materialized lake views are also now generally available, simplifying medallion architecture implementation in Spark SQL and PySpark and enabling always up-to-date pipelines with no manual orchestration. 
FabCon and SQLCon 2026: Unifying databases and Fabric on a single data platform | Microsoft Azure Blog

As a SQL developer working with large-scale analytics, I’m always looking for ways to simplify data pipelines and boost query performance. Recently, I’ve been exploring materialized lake views in Microsoft Fabric, and I want to share how they can transform the way we manage and serve data in a lakehouse environment.

Why Materialized Lake Views?

If you’ve ever built reporting datasets or complex aggregations in a lakehouse, you know the pain: Spark notebooks for transformations, pipelines for scheduling, and a lot of manual orchestration to keep everything fresh and consistent. Materialized lake views change the game by letting you define your transformations in SQL, then letting Fabric handle execution, storage, and refresh.

The result? Fast, query-ready assets persisted as Delta tables in OneLake, automatically refreshed on a schedule or when source data changes. No more worrying about refresh logic or execution order—just focus on your SQL.

When Should You Use Them?

I reach for materialized lake views when I need:

  • Frequently accessed aggregations (like daily sales or monthly metrics)
  • Complex joins across large tables that need consistent, up-to-date results
  • Declarative data quality rules (think: “drop rows where sales_amount <= 0”)
  • Reporting datasets that combine multiple sources and need automatic refresh
  • Medallion architecture (bronze → silver → gold) defined in SQL

They’re not for everything. For one-off queries, simple fast transformations, or non-SQL logic (like ML or Python), I stick with notebooks or other tools.

How Do They Work?

It’s simple: write a SQL query, and Fabric materializes the result as a Delta table. When source data changes, Fabric figures out the optimal refresh—incremental, full, or skip. You query the view like any other table, and built-in monitoring lets you track refreshes, data quality, and dependencies.

My Favorite Features

  • Automatic refresh optimization: Only new or changed data is processed when possible.
  • Built-in data quality: Add constraints right in your SQL, like:
  CONSTRAINT valid_sales CHECK (sales_amount > 0) ON MISMATCH DROP
  • Dependency management: Fabric handles execution order for you, even across chained views.
  • Monitoring: Track refresh status, data quality, and lineage in one place.

Real-World Example

Here’s how I create a daily sales summary that always stays fresh:

CREATE MATERIALIZED LAKE VIEW daily_sales AS
SELECT 
    DATE(order_date) as sale_date,
    region,
    SUM(amount) as total_sales,
    COUNT(*) as order_count
FROM orders 
GROUP BY DATE(order_date), region;

Limitations

Right now, cross-lakehouse lineage and execution aren’t supported, and for high-frequency streaming or non-SQL logic, you’ll want other tools.

Final Thoughts

Materialized lake views have made my data pipelines simpler, faster, and more reliable. If you’re building analytics in Microsoft Fabric, give them a try as you’ll spend less time orchestrating and more time delivering insights.

Planning in Microsoft Fabric IQ for SQL Developers

If you work with SQL in Fabric, you already know the pattern. Reporting data lands in Fabric SQL or Lakehouse tables, semantic models sit on top, and Power BI turns that into something business users understand. Planning has always been the odd one out, usually living in a separate tool that IT feeds with exports and integrations. Not anymore…

Announced today at FabCon today (March 18th 2026)
If you haven’t already, check out Arun Ulag’s hero blog “FabCon and SQLCon 2026: Unifying databases and Fabric on a single, complete platform” for a complete look at all of our FabCon and SQLCon announcements across both Fabric and our database offerings. 

With Planning in Microsoft Fabric IQ, that separation disappears. Planning now sits directly on top of the same Fabric SQL and semantic models that developers and data engineers already maintain.

Ref: Introducing Planning in Microsoft Fabric IQ: From historical data to forecasting the future | Microsoft Fabric Blog | Microsoft Fabric

For SQL professionals, this unlocks a much cleaner architecture, less integration work and more predictable data flows.


Planning Uses Your Existing Semantic Models

Planning in Fabric IQ reads business logic directly from Power BI semantic models. SQL developers no longer have to replicate definitions or maintain special “planning exports.”

If your measures, reference tables and dimensions are modeled correctly, Planning uses them as its foundation. This eliminates drift between planning logic and reporting logic, something that has always been painful in disconnected systems.


Writeback Lands in Fabric SQL

The biggest operational change for SQL developers is this. Planning writeback does not land in a proprietary planning database. It lands in Fabric SQL tables that you control and can query.

This means:

  • Forecasts and budgets are stored as regular tables
  • Versioning and governance policies work the same way as other data
  • You can join planning data with operational data without a round trip to another tool
  • Downstream BI reports update automatically because the data lives in the same environment

A typical structure might look like:

SQL

SELECT
    d.CustomerKey,
    f.ForecastAmount,
    f.Version,
    a.ActualSales
FROM dbo.FinanceForecast f
    JOIN dbo.DimCustomer d ON d.CustomerKey = f.CustomerKey
    JOIN dbo.SalesActuals a ON a.CustomerKey = f.CustomerKey
WHERE f.Version = '2026-Base'

No more external APIs, sync jobs or file drops.


OneLake Makes Planning Data Instantly Available

Shortcuts and mirroring remove a major pain point for SQL developers. Data used by planners does not need a dedicated pipeline or internal copy. If the data is already in OneLake, Planning can use it immediately.

This avoids:

  • Daily ETL loads
  • Redundant staging areas
  • Manual reconciliation work

For SQL developers, this is more predictable, more consistent and easier to operate.


A Single Environment From Actuals to Forecasts

Traditionally, SQL teams have had to maintain two parallel worlds.
One world holds actuals and historical performance.
Another world holds budgets and planning data from a separate system.

Planning in Fabric IQ merges these worlds:

  • Actuals remain in Lakehouse and Fabric SQL
  • Plans and scenarios write back into Fabric SQL
  • Semantic models unify both
  • Power BI reports read everything from the same data estate

This reduces the number of moving parts and makes lineage, governance and validation much easier.


Planning as a New Input to AI and Automation

Planning introduces a new type of data for SQL developers to work with: intent data.
Targets, constraints, scenario assumptions and expected outcomes become tables that intelligent agents can read.

This shifts planning from an isolated workflow to a central part of automated decision support. For SQL developers, this means new opportunities to model features, feed scoring pipelines and support decision logic with richer context than just historical data.


Why SQL Developers Should Care

Planning in Fabric IQ is worth paying attention to because it simplifies several long standing operational challenges:

  • No more pipeline maintenance to feed external planning tools
  • No more reconciliation work between planning and reporting
  • No more duplicated metrics in separate systems
  • No separate planning database that IT cannot fully control
  • Writeback is now a standard Fabric SQL operation
  • Planning logic aligns with semantic model logic

This is a major improvement for anyone who has had to support planning workflows while also maintaining clean, governed SQL environments.


Getting Started

Planning in Fabric IQ is available in preview. To experiment:

  1. Open a Fabric workspace
  2. Connect Planning to an existing semantic model
  3. Observe how writeback lands in Fabric SQL
  4. Integrate planning data with your existing T SQL workload

From there, it becomes clear that planning is no longer an external dependency. It is part of the platform, and SQL developers can finally treat it like any other governed dataset.

Reversing a semantic model w/ incremental refresh using Claude

Ever faced the issue of having to download a semantic model from the Power BI service but getting stuck when incremental refresh has been implemented?

Usually it’s a rule of thumb to keep the original .pbix file to allow for any modifications in Power BI Desktop, but sometimes that file is lost, misplaced or somehow no longer available. While there are some work arounds blogged out there I though I’d give Claude a test on converting a .bim file to a .pbip project. A .bim file is obtainable using Tabular Editor, which you can find as a free version here, or paid version here (additional features). Connect to the live semantic model and select Save As (model.bim).

Once the model.bim is on you computer, you can release Claude on it to convert it into a .pbip project that you can open and manage through Power BI Desktop.

NB!
Incremental refresh is to be re-configured once deployed to the service and a full refresh is required.

Below is the complete prompt that I had Claude generate once we were through some hiccups.

Property ‘datasetReference’ has not been defined and schema does not allow additional properties. Path ‘datasetReference’, line 3, position 21

Error Message:
Cannot read ‘C:\<path removed>\model.bim’. Missing required artifact ‘model.bim’.

DatasetDefinition: Required artifact is missing in ‘C:\<path removed>\definition.pbism’.

Only text with UTF8 encoding without BOM (byte order marks) is supported. Detected BOM: ‘UTF-8’

Prompt begin: Convert a .bim file into a .pbip (Power BI Project) structure.

Analyze the model.bim file

Given the model.bim file (SSAS Tabular / Power BI semantic model JSON), create a valid .pbip project that opens in Power BI Desktop. Follow these exact rules:

Folder structure

<ProjectName>.pbip
<ProjectName>.SemanticModel/
    .platform
    definition.pbism
    model.bim
<ProjectName>.Report/
    .platform
    definition.pbir

File contents

<ProjectName>.pbip

{
  "version": "1.0",
  "artifacts": [
    {
      "report": {
        "path": "<ProjectName>.Report"
      }
    }
  ],
  "settings": {
    "enableAutoRecovery": true
  }
}

<ProjectName>.SemanticModel/definition.pbism

{
  "version": "1.0",
  "settings": {}
}

<ProjectName>.SemanticModel/.platform

{
  "$schema": "https://developer.microsoft.com/json-schemas/fabric/gitIntegration/platformProperties/2.0.0/schema.json",
  "metadata": {
    "type": "SemanticModel",
    "displayName": "<ProjectName>"
  },
  "config": {
    "version": "2.0",
    "logicalId": "<generate-a-new-guid>"
  }
}

<ProjectName>.Report/definition.pbir

{
  "version": "4.0",
  "datasetReference": {
    "byPath": {
      "path": "../<ProjectName>.SemanticModel"
    },
    "byConnection": null
  }
}

<ProjectName>.Report/.platform

{
  "$schema": "https://developer.microsoft.com/json-schemas/fabric/gitIntegration/platformProperties/2.0.0/schema.json",
  "metadata": {
    "type": "Report",
    "displayName": "<ProjectName>"
  },
  "config": {
    "version": "2.0",
    "logicalId": "<generate-a-different-guid>"
  }
}

<ProjectName>.SemanticModel/model.bim — the original .bim file, placed here unchanged.

Critical rules

  1. model.bim goes directly in the <ProjectName>.SemanticModel/ folder — NOT in a definition/ subfolder. The definition/ subfolder is only for TMDL format.
  2. All files must be UTF-8 without BOM (no byte order mark). Use new System.Text.UTF8Encoding(false) or equivalent.
  3. definition.pbism must NOT contain a datasetReference property — that property belongs only in .pbir files. The .pbism schema only allows version and settings.
  4. definition.pbir must reference the SemanticModel via relative path using "byPath": { "path": "../<ProjectName>.SemanticModel" }.
  5. The .pbip file is the entry point — users double-click this to open in Power BI Desktop.
  6. The .platform files contain Fabric Git integration metadata. The logicalId GUIDs are placeholders that get regenerated on deployment.
  7. Derive <ProjectName> from the "name" property at the root of the .bim JSON.

If the model uses incremental refresh

If the .bim contains a table with refreshPolicy and policyRange partitions, and the goal is to convert to standard import mode:

  • Remove the refreshPolicy object from the table
  • Replace all policyRange partitions with a single "type": "m" partition containing the M query from the refreshPolicy.sourceExpression, but with the RangeStart/RangeEnd filter removed
  • Remove the RangeStart and RangeEnd tables from model.tables

Let me know in the comments if you find this useful, not working at all, or any experiences in between 🙂

ETL Orchestration: Air Traffic Control for Data

We have been working getting an enterprise grade event driven orchestration of our ETL system to operate like an airport control tower, managing a fleet of flights (data processes) as they progress through various stages of take-off, transit, and landing. All of this, because Microsoft Fabric has a core-based limit to the number of Notebook executions that a capacity can execute and have queued up in line for execution when invoking them using the REST API. Read the details here: limits (you know, it’s funny that there is no stated limits for Azure Service Bus Queues on number of messages in queue, but there is for Microsoft Fabric, which uses a Service Bus queue underneath…)

Each flight (file ingestion) is linked to a flight plan (orchestration) that ensures that flights follow predefined routes, encounter minimal delays, and arrive at their destination efficiently. Sometimes we have to wait for a combination of flights to arrive, before we can venture on to the next leg of the complete flight (multiple interdependent files landed in bronze before we process silver).

Each Flight Plan represents a predefined sequence of Notebook executions in Microsoft Fabric, moving data through:

  1. Bronze (Raw Data Ingestion)
  2. Silver (Data Transformation & Cleansing)
  3. Gold (Final High-Value Data)

Flight plans are determined using a combination of customer and some characteristics of the data.

A Finite State Machine (FSM) tracks each flight’s progress in states like Pending, Queued, Running, WaitingForDependency, and Succeeded, ensuring smooth execution and real-time tracking.

Receiving thousands of files, we are initiating a great number of Notebook executions this way using the REST API, we can easily hit the limits of Microsoft Fabric, a seat manager attends to the number of seats currently booked and will direct flights into a holding pattern if no free capacity is available. Flights enter WaitingForDependency status (runway congestion) and when a Notebook completes execution, a Notebook/Stop event is triggered, allowing the next waiting flight to take off.

A Fabric Database persists all flight plans, execution history, and dependencies, ensuring auditability, recovery, and coordination across multiple data loads. To handle all of the messaging we have chosen Azure Event Grid in combination with Azure Service Bus Queues. At the end of each queue there is an Azure Function App to process messages as they arrive.


The Tower Control (Orchestration Engine)

At the heart of this system is the Tower Control, which manages and directs all incoming and outgoing flights (ETL jobs).

  • It receives flight requests (messages) from the execute-queue (Azure Service Bus), ensuring that each flight follows an authorized flight plan before take-off.
  • The Tower Control doesn’t make decisions alone—it relies on the Planner to chart out the appropriate flight paths and manage scheduling.
  • The entire system ensures efficiency and scalability, preventing airport congestion (resource overutilization).

📅 The Planner (Orchestration Logic & SQL Database)

Before a flight can take off, it must file a flight plan—a predefined orchestration of file loads.

  1. Flight Plan Activation:
    • When a classifier/success event arrives in the execute-queue, it first checks if a flight plan exists in the SQL Server database for the given customer-provider pair.
    • If no active flight plan exists, a new one is registered and stored in the SQL database.
    • Each flight plan persists in the database, allowing the system to track ongoing operations, monitor execution history, and manage dependencies.
    • Flight plans are defined as templates in the meta schema, and the runtime version contains runtime information in addition to steps, workspace- and notebook-references as well as other required information to ensure the safe passage of the files.
  2. Flight Check-ins:
    • Any subsequent flights (events) for the same customer-provider combination check into the active flight plan instead of creating a new one.
    • This ensures coordination between related flights, preventing redundant take-offs and reducing airspace (resource) congestion.
    • This also ensures interdependency can be enforced, so that a crash can or cannot trigger a later flight, depending on the configuration for that flight plan.
  3. Stored Data & Execution Logic:
    • The SQL Server database keeps track of flight plans, execution states, dependencies, and processing history.
    • It enables seamless recovery, auditing, and tracking of past, present, and upcoming flights.

🚦 The Flight Status System (Finite State Machine)

Every flight is tracked by a Finite State Machine (FSM) monitoring its progress in real time. The SQL database stores and updates these statuses.

  • Pending – The flight request has been submitted but is waiting for the plan to commence.
  • Queued – The flight is cleared for take-off and waiting in the queue.
  • ✈ WaitingForDependency – The flight is delayed due to a lack of available runway space (Microsoft Fabric capacity maxed out).
  • Running – The flight is airborne and actively processing data.
  • Succeeded – The flight has landed safely, completing the data pipeline stage.

This FSM ensures systematic execution, prevents bottlenecks, and allows intelligent scheduling of flights.


🚀 Flight Execution (Notebook Orchestration in Microsoft Fabric)

Once a flight is cleared for take-off, it follows a prescribed route (Notebook executions in Microsoft Fabric):

  1. Bronze Notebook: ✈ The flight transports raw, unprocessed data into the Bronze layer.
  2. Silver Notebook: ✈ Data undergoes refinement, transformations, and cleansing.
  3. Gold Notebook: ✈ The final high-value data destination is reached, ensuring premium quality data is ready for analytics and reporting.

Each flight plan contains pre-determined orchestrations of file loads, meaning the system knows in advance which Notebooks need to be executed and in what sequence.


🛫 Managing Runway Congestion (Handling Resource Constraints in Microsoft Fabric)

Just like a busy airport can only handle a limited number of take-offs at a time, Microsoft Fabric has capacity constraints when executing Notebooks.

  • If the Fabric capacity is maxed out, new flights cannot take off immediately and are moved to “WaitingForDependency” status.
  • As soon as a Notebook completes execution, it sends a Notebook/Stop event, notifying the Tower Control that a slot has freed up.
  • This allows one waiting flight to be cleared for take-off, ensuring fair scheduling and optimal use of available resources.

🛬 Landing Safely (Successful Data Pipeline Execution)

  • Once all required Notebooks are executed successfully, the flight reaches its final destination.
  • The flight’s status is updated to “Succeeded” in the SQL Server database for logging, tracking, and reporting.
  • The system is now ready to accept new incoming flights and restart the process.
  • For every advancement from Bronze -> Silver -> Gold there will be fired a progression event for the Tower Control  to pick up.

Summary of the System’s Aviation Workflow

Aviation ConceptETL System EquivalentPurpose
Tower ControlAzure Function AppsManages flight take-off, scheduling, and execution
Flight PlanSQL Server DatabaseStores orchestrations and execution history
Planner Orchestration LogicDetermines optimal execution pathways
Flight Status (FSM)State MachineTracks execution progress
Flight Take-offNotebook ExecutionBegins processing raw data
Flight DelayWaitingForDependencyNotebook execution paused due to capacity constraints
Flight Stop EventNotebook Completion SignalFrees up capacity for the next waiting flight
Successful LandingData Processing CompletedPipeline execution finished successfully

The aviation theme has helped us create better mental images and relevant discussions during development.