ETL Orchestration: Air Traffic Control for Data

We have been working getting an enterprise grade event driven orchestration of our ETL system to operate like an airport control tower, managing a fleet of flights (data processes) as they progress through various stages of take-off, transit, and landing. All of this, because Microsoft Fabric has a core-based limit to the number of Notebook executions that a capacity can execute and have queued up in line for execution when invoking them using the REST API. Read the details here: limits (you know, it’s funny that there is no stated limits for Azure Service Bus Queues on number of messages in queue, but there is for Microsoft Fabric, which uses a Service Bus queue underneath…)

Each flight (file ingestion) is linked to a flight plan (orchestration) that ensures that flights follow predefined routes, encounter minimal delays, and arrive at their destination efficiently. Sometimes we have to wait for a combination of flights to arrive, before we can venture on to the next leg of the complete flight (multiple interdependent files landed in bronze before we process silver).

Each Flight Plan represents a predefined sequence of Notebook executions in Microsoft Fabric, moving data through:

  1. Bronze (Raw Data Ingestion)
  2. Silver (Data Transformation & Cleansing)
  3. Gold (Final High-Value Data)

Flight plans are determined using a combination of customer and some characteristics of the data.

A Finite State Machine (FSM) tracks each flight’s progress in states like Pending, Queued, Running, WaitingForDependency, and Succeeded, ensuring smooth execution and real-time tracking.

Receiving thousands of files, we are initiating a great number of Notebook executions this way using the REST API, we can easily hit the limits of Microsoft Fabric, a seat manager attends to the number of seats currently booked and will direct flights into a holding pattern if no free capacity is available. Flights enter WaitingForDependency status (runway congestion) and when a Notebook completes execution, a Notebook/Stop event is triggered, allowing the next waiting flight to take off.

A Fabric Database persists all flight plans, execution history, and dependencies, ensuring auditability, recovery, and coordination across multiple data loads. To handle all of the messaging we have chosen Azure Event Grid in combination with Azure Service Bus Queues. At the end of each queue there is an Azure Function App to process messages as they arrive.


The Tower Control (Orchestration Engine)

At the heart of this system is the Tower Control, which manages and directs all incoming and outgoing flights (ETL jobs).

  • It receives flight requests (messages) from the execute-queue (Azure Service Bus), ensuring that each flight follows an authorized flight plan before take-off.
  • The Tower Control doesn’t make decisions alone—it relies on the Planner to chart out the appropriate flight paths and manage scheduling.
  • The entire system ensures efficiency and scalability, preventing airport congestion (resource overutilization).

📅 The Planner (Orchestration Logic & SQL Database)

Before a flight can take off, it must file a flight plan—a predefined orchestration of file loads.

  1. Flight Plan Activation:
    • When a classifier/success event arrives in the execute-queue, it first checks if a flight plan exists in the SQL Server database for the given customer-provider pair.
    • If no active flight plan exists, a new one is registered and stored in the SQL database.
    • Each flight plan persists in the database, allowing the system to track ongoing operations, monitor execution history, and manage dependencies.
    • Flight plans are defined as templates in the meta schema, and the runtime version contains runtime information in addition to steps, workspace- and notebook-references as well as other required information to ensure the safe passage of the files.
  2. Flight Check-ins:
    • Any subsequent flights (events) for the same customer-provider combination check into the active flight plan instead of creating a new one.
    • This ensures coordination between related flights, preventing redundant take-offs and reducing airspace (resource) congestion.
    • This also ensures interdependency can be enforced, so that a crash can or cannot trigger a later flight, depending on the configuration for that flight plan.
  3. Stored Data & Execution Logic:
    • The SQL Server database keeps track of flight plans, execution states, dependencies, and processing history.
    • It enables seamless recovery, auditing, and tracking of past, present, and upcoming flights.

🚦 The Flight Status System (Finite State Machine)

Every flight is tracked by a Finite State Machine (FSM) monitoring its progress in real time. The SQL database stores and updates these statuses.

  • Pending – The flight request has been submitted but is waiting for the plan to commence.
  • Queued – The flight is cleared for take-off and waiting in the queue.
  • ✈ WaitingForDependency – The flight is delayed due to a lack of available runway space (Microsoft Fabric capacity maxed out).
  • Running – The flight is airborne and actively processing data.
  • Succeeded – The flight has landed safely, completing the data pipeline stage.

This FSM ensures systematic execution, prevents bottlenecks, and allows intelligent scheduling of flights.


🚀 Flight Execution (Notebook Orchestration in Microsoft Fabric)

Once a flight is cleared for take-off, it follows a prescribed route (Notebook executions in Microsoft Fabric):

  1. Bronze Notebook: ✈ The flight transports raw, unprocessed data into the Bronze layer.
  2. Silver Notebook: ✈ Data undergoes refinement, transformations, and cleansing.
  3. Gold Notebook: ✈ The final high-value data destination is reached, ensuring premium quality data is ready for analytics and reporting.

Each flight plan contains pre-determined orchestrations of file loads, meaning the system knows in advance which Notebooks need to be executed and in what sequence.


🛫 Managing Runway Congestion (Handling Resource Constraints in Microsoft Fabric)

Just like a busy airport can only handle a limited number of take-offs at a time, Microsoft Fabric has capacity constraints when executing Notebooks.

  • If the Fabric capacity is maxed out, new flights cannot take off immediately and are moved to “WaitingForDependency” status.
  • As soon as a Notebook completes execution, it sends a Notebook/Stop event, notifying the Tower Control that a slot has freed up.
  • This allows one waiting flight to be cleared for take-off, ensuring fair scheduling and optimal use of available resources.

🛬 Landing Safely (Successful Data Pipeline Execution)

  • Once all required Notebooks are executed successfully, the flight reaches its final destination.
  • The flight’s status is updated to “Succeeded” in the SQL Server database for logging, tracking, and reporting.
  • The system is now ready to accept new incoming flights and restart the process.
  • For every advancement from Bronze -> Silver -> Gold there will be fired a progression event for the Tower Control  to pick up.

Summary of the System’s Aviation Workflow

Aviation ConceptETL System EquivalentPurpose
Tower ControlAzure Function AppsManages flight take-off, scheduling, and execution
Flight PlanSQL Server DatabaseStores orchestrations and execution history
Planner Orchestration LogicDetermines optimal execution pathways
Flight Status (FSM)State MachineTracks execution progress
Flight Take-offNotebook ExecutionBegins processing raw data
Flight DelayWaitingForDependencyNotebook execution paused due to capacity constraints
Flight Stop EventNotebook Completion SignalFrees up capacity for the next waiting flight
Successful LandingData Processing CompletedPipeline execution finished successfully

The aviation theme has helped us create better mental images and relevant discussions during development.

Loading

OneLake – External Data Sharing

At #MSIgnite Microsoft announced a new feature in Fabric that allows people from one organization to share data with people from another organization. You might ask yourself why is this even news, and rightly so. Up until last week, professionals have had to use tools like (S)FTP clients like FileZilla, Azure Storage Explorer, WeTransfer or similar products in order to share data. Some of these tools are in fact hard to use and/or understand for a great number of business users – they are familiar with Windows and the Office suite and not much more. This is all to be expected, as business users in general should focus on business stuff rather than IT stuff.

As of last week this picture has changed quite dramatically as Microsoft has introduced what they refer to as External Data Sharing in Microsoft Fabric. Even though this new feature involves some configuration from the IT department, once it’s setup the end user can actually be allowed to share data with external organizations through what looks to be the File Explorer! 🔥 At least it looks like the File Explorer, but is in fact another application end users will need to install on top, to enable this functionality. The tool is called OneLake File Explorer and is obviously a file explorer for OneLake in Microsoft Fabric. In the following diagram, Microsoft demonstrates the feature and even underlines that no data is copied from one tenant to the other – all data is shared in-place.

Think about it just one more time – The end user will be able to, on their own device, copy and paste data from local folders to OneLake synchronized folders (also on their own device) which then gets synchronized to another tenant. The tool works just like the OneDrive application, which means that it keeps files in synch between your device and OneLake.

Admin Settings in Tenant A

Configuring the functionality requires the sharing organizations (Tenant A) to toggle a settings in their Fabric Admin section.

The setting “External Data Sharing” should be allowed, and it is recommended that this is allowed only to a specific security group for easier management of access through the IT department.

As per screenshot above, members of the security group “CatMan” are the only ones who are allowed to share externally. One note that is highlighted in the yellow box might be worth considering before using this feature.

The functionality will work, even if the receiving organization (Tenant B) does not allow sharing as described above.

Sharing from Tenant A

Suppose you already have a lakehouse in Microsoft Fabric, (otherwise here’s a great introduction on how to create that), and you want to share files or tables with an external business user or it-professional. Then the following steps will allow you just to do that.

I have uploaded my Important Business Numbers.xlsx spreadsheet in my folder File_Share. I need this file for my critical workloads in my BI analysis but I also want to share these numbers with a professional outside my organization.

From inside the workspace in Tenant A I can now (due to the configuration in the admin portal) choose to share data externally by clicking the three dots (…) on the lakehouse in question.

Choosing this option guides me to a wizard where I get to select what data items I would like to share. The supported item types are data residing in tables or files in lakehouses and mirrored databases.

In this case, I choose to share an entire folder named File_Share.

Clicking ‘Save and Continue‘ leads me to a new dialog, where I get to assign who I want to share this data with. Sharing in this way does NOT require Entra B2B guest user access but is relying on a dedicated Fabric-to-Fabric authentication mechanism. Also note that the sharer from Tenant A can’t control who has access to the data in Tenant B. Access can even be granted to guest users of Tenant B.

In this example the sharer can either choose to send the grant as an email, copy the link and send that through Teams or other option. The intended receive has 90 days to accept the invitation, after which the invitation expires.

Accepting share from Tenant B

In order for the user in Tenant B to accept the share, they have to have access to a lakehouse that becomes the target of the share. Please see link to setup a lakehouse.

Here the user Testy McTestify has created a workspace in Tenant B and also created a lakehouse called Tenant_B_Lakehouse.

Testy can now accept the share in more than one way, either by mail by clicking an accept button that directs him to the fabric portal where you will be guided through the next steps in accepting the invitation. Or Testy can simply click or paste in the link in a browser and begin the same journey as above. Either way, the below screen will be presented once authorization has completed.

Testy McTestify is a user in the domain @catmansolution.com (Tenant B) and the invite was sent from Tenant A which is @catman.bi – this information is also present in the dialog, along with details on what is shared.

Now Testy has to select the lakehouse that will house the referenced folder (in this case). Here Testy chooses Tenant_B_Lakehouse.

And the final step is to place the shared folder in the files hierarchy that exists in Tenant_B_Lakehouse, and here Testy just places the folder in the root.

Two notifications will pop up and inform you on the relevant actions taken.

As soon as that process is completed (within seconds) the files from the folder in Tenant A are available as if present in Tenant B

OneLake Explorer

Installing OneLake Explorer will allow Testy McTestify to access the same files and folders synchronized on his device. This is, as you can imagine, immensely powerful as almost every business user knows how to operate Windows File Explorer and OneDrive on their device – this is right up their alley and not some odd third party product that IT needs to whitelist for them alone. Chances are that OneLake Explorer is already in use in the organization and no further action from IT is needed.

I simply love the potential of this new feature that I feel has traveled well below the radar, covered by all the AI and CoPilot noise over the last couple of weeks.

Loading

Unexplainable behavior’s with DefaultAzureCredential()

Long story, short (2 days later)

While implementing an Azure Function that is designed to fetch secrets from Azure KeyVault, I ran into a funny and odd issue. I am not able to explain why and what is going on, but I have tried every trick a google search can conjure, at least until page 30 in the search results. It was by coincidence I came across some of the parameters in the DefaultAzureCredentialOptions class that got me going, at least locally.

The idea, as far as I have understood, is that whenever you invoke the Azure.Identity.DefaultAureCredential class, it provides a flow for attempting authentication using one of the following credentials, in listed order:

I suspect that since I have deployed my Azure Function using the Managed Identity setting to a Systems Assigned identity, like this:

System Assigned Identity

AND the fact that ManagedIdentityCredential is before VisualStudioCredential in the authentication flow, it fails, since it is unable to authenticate the managed identity – which is the main principle of the design – none other than the service can assume the identity of the service.

See more detail here: https://learn.microsoft.com/en-us/azure/active-directory/managed-identities-azure-resources/overview
Snip

  • System-assigned. Some Azure resources, such as virtual machines allow you to enable a managed identity directly on the resource. When you enable a system-assigned managed identity:
    • A service principal of a special type is created in Azure AD for the identity. The service principal is tied to the lifecycle of that Azure resource. When the Azure resource is deleted, Azure automatically deletes the service principal for you.
    • By design, only that Azure resource can use this identity to request tokens from Azure AD.
    • You authorize the managed identity to have access to one or more services.
    • The name of the system-assigned service principal is always the same as the name of the Azure resource it is created for. For a deployment slot, the name of its system-assigned identity is <app-name>/slots/<slot-name>.

Love rears it’s ugly head

Having assigned the proper permissions in the Azure KeyVault, you are able to connect using your credentials in Visual Studio to said KeyVault. A code example of that could look like this:

public static string GetSecret( string keyvault, string secret )
{            
   var kvUri = $"https://{keyvault}.vault.azure.net";
 
   var creds = new DefaultAzureCredential();
 
   var client = new SecretClient(new Uri(kvUri), creds);
   var secret = client.GetSecretAsync(secret).Result.Value.Value;
 
   return secret;
}

(link to NuGet: NuGet Gallery | Azure.Security.KeyVault.Secrets 4.5.0)

Usually this works, and I have no other explanation than having deployed the solution to a live running App Service is what breaks this otherwise elegant piece of code. The above listed code does not work for me.

Workaround

You can instantiate the DefaultAzureCredential class using a constructor that takes a DefaultAzureCredentialOptions object as a parameter and this object has a great number of attributes that are of interest. You can actively remove items in the authentication flow and you can specify the tenant id, if you have access to multiple tenants.

The code that resolved the issue locally looks something like this. (I can probably just do without the ManagedIdentity, will test)

public static string GetSecret( string keyvault, string secret )
{            
   var kvUri = $"https://{keyvault}.vault.azure.net";
 
 
    var creds = new DefaultAzureCredential(
        new DefaultAzureCredentialOptions() {
        TenantId = "<INSERT TENANT ID HERE>"
        , ExcludeAzureCliCredential = true
        , ExcludeAzurePowerShellCredential = true
        , ExcludeSharedTokenCacheCredential = true
        , ExcludeVisualStudioCodeCredential = true
        , ExcludeEnvironmentCredential = true
        , ExcludeManagedIdentityCredential = true
    });
 
 
    var client = new SecretClient(new Uri(kvUri), creds);
   var secret = client.GetSecretAsync(secret).Result.Value.Value;
 
   return secret;
}

I am not sure this will work when I deploy the solution, but I will probably create a test on environment (local debug or running prod)

HTH

Loading

New Microsoft certifications passed

This summer my family and I spent almost three weeks driving to Germany and into Italy by car. Not just any car I might add. The old Volvo clicked in 4.000 km and handled it like a charm 🥰 even when it was super packed for the final stage of the journey from the duty free shop just across the border.

Main cities visited Nürnberg, Venice, Bologna, Brisighella, Comacchio… and obviously I had to make a stop at the Mutti field of tomatoes, as Mutti is a client of ours 🙂

Assorted Pictures from the vacation

Just before the vacation began, I got notice, that I had passed the two beta exams I attended in the middle of May. With beta exams you do not get the passing score immediately, you have to wait ’till the program has collected enough data on the individual questions/answers to release the final version of the test.

Microsoft Power Automate RPA Developer (PL-500)

First of I passed the Microsoft Power Automate RPA Developer (PL-500) exam, which was quite a stretch for me, and I had even raised some concerns about the scope of the test before, in the below announcement on LinkedIn:

Candidates for this exam automate time-consuming and repetitive tasks by using Microsoft Power Automate (formerly known as Flow). They review solution requirements, create process documentation, and design, develop, troubleshoot, and evaluate solutions.

Candidates work with business stakeholders to improve and automate business workflows. They collaborate with administrators to deploy solutions to production environments, and they support solutions.

Additionally, candidates should have experience with JSON, cloud flows and desktop flows, integrating solutions with REST and SOAP services, analyzing data by using Microsoft Excel, VBScript, Visual Basic for Applications (VBA), HTML, JavaScript, one or more programming languages, and the Microsoft Power Platform suite of tools (AI Builder, Power Apps, Dataverse, and Power Virtual Agents).

 Important

Passing score: 700. Learn more about exam scores. (which is exactly what I scored 😁)

Part of the requirements for: Microsoft Certified: Power Automate RPA Developer Associate

The detailed skills are outlined here: Exam PL-500: Microsoft Power Automate RPA Developer – Skills Measured

Designing and Implementing Enterprise-Scale Analytics Solutions Using Microsoft Azure and Microsoft Power BI (DP-500)

Candidates for this exam should have advanced Power BI skills, including managing data repositories and data processing in the cloud and on-premises, along with using Power Query and Data Analysis Expressions (DAX). They should also be proficient in consuming data from Azure Synapse Analytics and should have experience querying relational databases, analyzing data by using Transact-SQL (T-SQL), and visualizing data.

 Important

Passing score: 700. Learn more about exam scores.

 Tip

Part of the requirements for: Microsoft Certified: Azure Enterprise Data Analyst Associate

Active certifications

New/Renewed 2022

Other Active Certifications

Loading

Microsoft Connected Learning Experience program

Excited to be a part of the panel for the Microsoft Connected Learning Experience program! Join us for the program starting April 22nd as we talk about Azure Fundamental exams and give you a personalized and prescriptive learning experience for exam readiness.


Study Smart…Not Hard! and pass MS Azure Fundamental exams.

Register here: https://clx.cloudevents.ai/

#avd #cloudlearning #microsoftcertified #microsoftazure #microsoft #CLX #MSAzureCLX #ConnectedLearningExperience #CloudLabs #SpektraSystems

Loading