Canary monitoring for Power BI semantic models
Quarterly sales season always brings the same ritual: a business user opens Power BI, clicks into the sales dashboard, and waits for the numbers to load. It has been a while since anyone looked at this particular report. The numbers look off. Someone hits refresh. The spinner turns, but the data does not budge. The last refresh was weeks ago.
That is when the emails start. “Can you check why the sales model hasn’t updated?” The support thread grows. IT investigates. The answer is usually not in the data. The refresh failed at 3 AM. Or the data source changed its connection string last Tuesday. Or someone revoked the service principal without telling anyone.
By the time the root cause is found, the report has been open in meetings for hours, and the numbers have been wrong for days.
I built Semantic Sonar because I wanted that conversation to happen at 3:01 AM when the refresh failed, not at 10:30 AM when the quarterly review is already underway. The idea is a canary: a lightweight DAX query fired at each semantic model on a schedule. If it comes back clean, the model is reachable and queryable. If it does not, something is wrong right now, not three days from now.
Registering the first model
The first page most people land on after setting up the system is the workspace browser. The alternative is typing workspace GUIDs and dataset IDs by hand, which I did exactly once before deciding to build the browser.
You pick a tenant from the dropdown, the browser calls the Power BI REST API with the tenant’s service principal, and it returns a list of workspaces. Expand a workspace and you see every semantic model in it with its display name. Click Register and a form pre-fills with the workspace ID, dataset ID, and a default DAX query:
EVALUATE ROW("Ping", 1)
That default is intentionally trivial. You want the canary to be cheap. Its job is not to test the data. Its job is to confirm that the model is reachable and that the service principal can still authenticate. A single-row EVALUATE ROW does that in under a second on any model that is working.
For most models I change the query to something slightly more meaningful: a measure that actually touches the fact table, so a permission problem on the underlying source would surface. But the principle stays the same: keep it light, keep it fast, run it often.
The interval sits at 60 minutes by default. For critical models I set it lower, 15 or 30 minutes. The minimum the system allows is 1 minute, though for most models that is more noise than signal.
The dashboard: what you see first
Once a few models are registered and the scheduler has had time to run, the dashboard starts telling its story.
Four numbers across the top: total models, active, failing, disabled. In a healthy fleet they read something like 24, 22, 1, 1. The single failing model is the one you care about. The disabled one was auto-disabled after 30 consecutive failures, which usually means it has been broken for a long time and nobody has got around to it.
Below the stat cards, there are two lists. The first is models at risk: anything with 10 or more consecutive failures. These are on a countdown. At 30 consecutive failures the system disables them automatically. The at-risk list is early warning. You have a window to investigate before the model drops off monitoring entirely.
The second list is recent failures: the last handful of failed checks across all models, with timestamps and the raw error message. That error message is usually enough to know immediately whether something needs attention now:
Power BI API returned 401: {"error":{"code":"TokenExpired",...}}
versus
Power BI API returned 404: {"error":{"code":"ItemNotFound",...}}
TokenExpired means a credential needs rotating. ItemNotFound means someone deleted or moved the model. Different problems, different urgency, different owner to call.
The dashboard auto-refreshes every 60 seconds. I leave it open on a side monitor. When the amber tile count goes up, I look.
Drilling into a model
From the dashboard or the models list, clicking through to a single model shows the full result history.
The top section repeats the key facts: current status badge (Active, Failing, Disabled), consecutive failure count, last run timestamp, last run success or failure. Below that is a latency chart. This is the view that catches the subtler problems.
A model that is technically succeeding but whose latency has climbed from 200ms to 4,000ms over a week is not a healthy model. Something upstream changed. Maybe a gateway is under new load. Maybe a DirectQuery source started doing full scans. The trend is visible here because every result gets recorded with its latency in milliseconds.
The full result history below the chart shows each check: timestamp, success or failure, latency, row count, and the first row returned as JSON. That last one is useful. If your canary query is:
EVALUATE ROW("SalesTotal", [Sum of Sales Value])
then the first row shows the actual value that came back. A value of 0 that held steady for months and then jumped to null is interesting. It does not mean the model is broken in the way the system defines it, but it means something changed.
The models list: the fleet view
The models list page is the operational view. It has search, filter by tenant, filter by status (all/active/disabled/failing), and filter by tag. Tags are free-form strings you can add to any model, so you can group by environment (production, staging), by team, or by data domain.
Sort by health score ascending and the most degraded models float to the top. The health score is a composite across consecutive failure count, recent latency relative to the model’s own baseline, and uptime across the rolling windows. It is not a formal SLA calculation; it is a signal for where to look first.
The card for each model shows the health grade (A through F), the last run time, the consecutive failure count, and a quick-run button. That last one is useful during an investigation: rather than waiting for the next scheduled run, you can trigger a check immediately and watch the result appear in the history a few seconds later.
Uptime: putting numbers on it
The Uptime page is for the conversation with stakeholders who want to know how reliable their models are in terms they can put in a report.
Each model gets three rolling windows: last 24 hours, last 7 days, last 30 days. Each window shows total checks, successes, and uptime percentage with a colour-coded bar:
- 99.5% and above: green
- 95 to 99.5%: yellow
- 90 to 95%: orange
- Below 90%: red
Across the top of the table, the aggregate uptime across all models for each window gives a fleet-level headline. “Fleet availability last 7 days: 97.4%” is a concrete number to bring to a monthly review.
The same page also shows p95 latency trends per model. Uptime tells you whether the checks are succeeding. Latency tells you whether the models that are succeeding are doing so at acceptable speed.
Radar: the visual overview
The radar page is the one I use when I want to scan the whole fleet at a glance. It renders each model as a dot on a canvas, positioned by p95 latency on one axis and health score on the other, with dot size reflecting which models have the most weight (based on how frequently they run). Colour is the health grade.
In a healthy deployment most dots cluster in the top-left: low latency, high health score, green or lime. The outliers stand out immediately. A large red dot in the bottom-right means a high-frequency model with degraded health and elevated latency. That is the one to investigate first.
Hovering a dot shows a tooltip with model name, tenant, score, grade, and current p95 in milliseconds.
Maintenance windows: the expected failures
One thing that caused false alarms early on was scheduled maintenance. The underlying SQL Server gets taken offline every Saturday between 2 and 4 AM. The canary runs every 30 minutes. The result: a string of failures, the at-risk counter climbing, an alert firing at 2:15 AM for something everyone already knew was happening.
Maintenance windows solve that. Each window has a start time, end time, and day-of-week pattern. Two flags control what happens during the window. suppressAlerts keeps running checks and recording results but does not fire webhooks. skipCanary stops running checks entirely, which is what you want when the data source is deliberately taken offline.
The scheduler checks for an active window before it enqueues a job. If skipCanary is set, it advances the next run time and skips the queue. The model’s consecutive failure counter does not increment. When the window ends, monitoring resumes from a clean state.
Webhooks: getting alerted
The dashboard is useful when you are looking at it. For everything else there are webhooks.
A webhook fires on three events: a model transitioning from healthy to failing, a model recovering after failures, and a model being auto-disabled. You configure one or more webhook URLs on a model, each with an optional shared secret. The payload is a JSON object with the event type, model name, tenant, and a details string.
Payloads are signed with HMAC-SHA256 and the signature goes into an X-Signature header, so the receiving system can verify the sender before acting on the message. Delivery is best-effort: a failed webhook delivery is logged in Application Insights but does not affect the canary result or retry behaviour.
In practice I point webhooks at a Power Automate flow that posts to Teams. A failure notification lands in the right channel within a minute of the first failed check.
The multi-tenant model
The system was built from the start to monitor multiple customers’ models from a single deployment. That shapes the credential design more than anything else.
Each customer is a tenant record in Cosmos DB: their display name, their Entra ID (the GUID of their Azure AD tenant), and the client ID of an app registration created in their tenant. The client secret for that app registration goes into Key Vault with the name tenant-{tenantId}-client-secret.
When the worker runs a check, it reads the right secret, acquires an OAuth 2.0 token scoped to the customer’s Entra tenant, and calls the Power BI executeQueries endpoint on their behalf using their own service principal. The token is cached in memory with a 5-minute buffer before expiry.
var clientSecret = await _keyVault.GetTenantClientSecretAsync(tenant.EntraId, ct);
var credential = new ClientSecretCredential(tenant.EntraId, tenant.ClientId, clientSecret);
All other service-to-service calls (the Function App talking to Cosmos DB, Key Vault, and the Storage Queue) use a system-assigned managed identity. No connection strings in configuration. The Power BI API is the one exception: it requires a credential that the customer’s tenant has consented to, and a managed identity from the hosting tenant cannot satisfy that requirement. So that leg stays on client credentials.
One edge case: live connections and SSAS
Most models work fine with the executeQueries endpoint. The ones that do not are models live-connected to Analysis Services or on-premises SSAS through a gateway, where the gateway blocks the query execution endpoint but still allows the metadata API.
For those, setting queryMode to rest on the model switches the worker to a lighter ping that calls the dataset metadata endpoint instead. It does not execute a DAX query, but it confirms that the service principal can reach the model and that the Power BI API returns a 200. If the gateway is down, that returns a 400 or 503. The rest of the pipeline, the result recording, the consecutive failure counter, the dashboard, is identical.
Audit log
The audit page records configuration changes: who added a model, who changed an interval, who disabled a tenant. It is the answer to “what changed right before the failures started.” Filter by tenant or by entity ID. The last 100 entries load by default.
I added this after the third time someone changed a model’s DAX query to something that always returns zero rows, and nobody could remember who had done it or when.
Deployment
The infrastructure is Bicep, deployed via the Azure Developer CLI:
azd auth login
azd up
That provisions everything: the Static Web App, the Function App, Cosmos DB, Key Vault, Storage, and Application Insights. The Bicep modules are in infra/modules/, and the Azure Developer CLI service definitions in azure.yaml handle the build and deploy sequence for both the .NET Functions project and the Next.js frontend.
After deployment, you add the first tenant and model through the UI, store the client secret in Key Vault, and wait for the first scheduled run. The README has the full post-deployment checklist.
What it does not do
Semantic Sonar tells you whether a model is reachable and queryable. It does not tell you whether the numbers coming out of it are correct. A model that returns [Sum of Sales Value] = 0 because the ETL truncated the fact table will pass every canary check.
The “failing” status also does not distinguish between the model being down, the underlying data source being down, and the service principal’s permissions having expired. The error message in the result history usually makes it clear, but the system does not attempt to classify the cause.
The 90-day TTL on results caps the uptime windows at 30 days. If you want longer retention you can change the TTL in CanaryResult.cs and the Cosmos DB container configuration, but the uptime calculation logic would need updating to match.
Getting the code
The repository is at github.com/vestergaardj/semantic-sonar. The README covers prerequisites, local development, and the full post-deployment walkthrough.