We have been working getting an enterprise grade event driven orchestration of our ETL system to operate like an airport control tower, managing a fleet of flights (data processes) as they progress through various stages of take-off, transit, and landing. All of this, because Microsoft Fabric has a core-based limit to the number of Notebook executions that a capacity can execute and have queued up in line for execution when invoking them using the REST API. Read the details here: limits (you know, it’s funny that there is no stated limits for Azure Service Bus Queues on number of messages in queue, but there is for Microsoft Fabric, which uses a Service Bus queue underneath…)
Each flight (file ingestion) is linked to a flight plan (orchestration) that ensures that flights follow predefined routes, encounter minimal delays, and arrive at their destination efficiently. Sometimes we have to wait for a combination of flights to arrive, before we can venture on to the next leg of the complete flight (multiple interdependent files landed in bronze before we process silver).
Each Flight Plan represents a predefined sequence of Notebook executions in Microsoft Fabric, moving data through:
- Bronze (Raw Data Ingestion)
- Silver (Data Transformation & Cleansing)
- Gold (Final High-Value Data)
Flight plans are determined using a combination of customer and some characteristics of the data.
A Finite State Machine (FSM) tracks each flight’s progress in states like Pending, Queued, Running, WaitingForDependency, and Succeeded, ensuring smooth execution and real-time tracking.
Receiving thousands of files, we are initiating a great number of Notebook executions this way using the REST API, we can easily hit the limits of Microsoft Fabric, a seat manager attends to the number of seats currently booked and will direct flights into a holding pattern if no free capacity is available. Flights enter WaitingForDependency status (runway congestion) and when a Notebook completes execution, a Notebook/Stop event is triggered, allowing the next waiting flight to take off.
A Fabric Database persists all flight plans, execution history, and dependencies, ensuring auditability, recovery, and coordination across multiple data loads. To handle all of the messaging we have chosen Azure Event Grid in combination with Azure Service Bus Queues. At the end of each queue there is an Azure Function App to process messages as they arrive.
✈ The Tower Control (Orchestration Engine)
At the heart of this system is the Tower Control, which manages and directs all incoming and outgoing flights (ETL jobs).
- It receives flight requests (messages) from the execute-queue (Azure Service Bus), ensuring that each flight follows an authorized flight plan before take-off.
- The Tower Control doesn’t make decisions alone—it relies on the Planner to chart out the appropriate flight paths and manage scheduling.
- The entire system ensures efficiency and scalability, preventing airport congestion (resource overutilization).
📅 The Planner (Orchestration Logic & SQL Database)
Before a flight can take off, it must file a flight plan—a predefined orchestration of file loads.
- Flight Plan Activation:
- When a classifier/success event arrives in the execute-queue, it first checks if a flight plan exists in the SQL Server database for the given customer-provider pair.
- If no active flight plan exists, a new one is registered and stored in the SQL database.
- Each flight plan persists in the database, allowing the system to track ongoing operations, monitor execution history, and manage dependencies.
- Flight plans are defined as templates in the meta schema, and the runtime version contains runtime information in addition to steps, workspace- and notebook-references as well as other required information to ensure the safe passage of the files.
- Flight Check-ins:
- Any subsequent flights (events) for the same customer-provider combination check into the active flight plan instead of creating a new one.
- This ensures coordination between related flights, preventing redundant take-offs and reducing airspace (resource) congestion.
- This also ensures interdependency can be enforced, so that a crash can or cannot trigger a later flight, depending on the configuration for that flight plan.
- Stored Data & Execution Logic:
- The SQL Server database keeps track of flight plans, execution states, dependencies, and processing history.
- It enables seamless recovery, auditing, and tracking of past, present, and upcoming flights.
🚦 The Flight Status System (Finite State Machine)
Every flight is tracked by a Finite State Machine (FSM) monitoring its progress in real time. The SQL database stores and updates these statuses.
- ✈ Pending – The flight request has been submitted but is waiting for the plan to commence.
- ✈ Queued – The flight is cleared for take-off and waiting in the queue.
- ✈ WaitingForDependency – The flight is delayed due to a lack of available runway space (Microsoft Fabric capacity maxed out).
- ✈ Running – The flight is airborne and actively processing data.
- ✈ Succeeded – The flight has landed safely, completing the data pipeline stage.
This FSM ensures systematic execution, prevents bottlenecks, and allows intelligent scheduling of flights.
🚀 Flight Execution (Notebook Orchestration in Microsoft Fabric)
Once a flight is cleared for take-off, it follows a prescribed route (Notebook executions in Microsoft Fabric):
- Bronze Notebook: ✈ The flight transports raw, unprocessed data into the Bronze layer.
- Silver Notebook: ✈ Data undergoes refinement, transformations, and cleansing.
- Gold Notebook: ✈ The final high-value data destination is reached, ensuring premium quality data is ready for analytics and reporting.
Each flight plan contains pre-determined orchestrations of file loads, meaning the system knows in advance which Notebooks need to be executed and in what sequence.
🛫 Managing Runway Congestion (Handling Resource Constraints in Microsoft Fabric)
Just like a busy airport can only handle a limited number of take-offs at a time, Microsoft Fabric has capacity constraints when executing Notebooks.
- If the Fabric capacity is maxed out, new flights cannot take off immediately and are moved to “WaitingForDependency” status.
- As soon as a Notebook completes execution, it sends a Notebook/Stop event, notifying the Tower Control that a slot has freed up.
- This allows one waiting flight to be cleared for take-off, ensuring fair scheduling and optimal use of available resources.
🛬 Landing Safely (Successful Data Pipeline Execution)
- Once all required Notebooks are executed successfully, the flight reaches its final destination.
- The flight’s status is updated to “Succeeded” in the SQL Server database for logging, tracking, and reporting.
- The system is now ready to accept new incoming flights and restart the process.
- For every advancement from Bronze -> Silver -> Gold there will be fired a progression event for the Tower Control to pick up.
✈ Summary of the System’s Aviation Workflow
Aviation Concept | ETL System Equivalent | Purpose |
Tower Control | Azure Function Apps | Manages flight take-off, scheduling, and execution |
Flight Plan | SQL Server Database | Stores orchestrations and execution history |
Planner | Orchestration Logic | Determines optimal execution pathways |
Flight Status (FSM) | State Machine | Tracks execution progress |
Flight Take-off | Notebook Execution | Begins processing raw data |
Flight Delay | WaitingForDependency | Notebook execution paused due to capacity constraints |
Flight Stop Event | Notebook Completion Signal | Frees up capacity for the next waiting flight |
Successful Landing | Data Processing Completed | Pipeline execution finished successfully |
The aviation theme has helped us create better mental images and relevant discussions during development.