#TSQL2SDAY – Data Modeling Gone Wrong

tsql2sdayThis month marks the 72nd T-SQL Tuesday.  Adam Machanic’s (b|l|t) started the T-SQL Tuesday blog party in December of 2009. Each month an invitation is sent out on the first Tuesday of the month, inviting bloggers to participate in a common topic. On the second Tuesday of the month all the bloggers post their contribution to the event for everyone to read. The host sums up all the participant’s entries at the end of the week.
This month Mickey Stuewe (b|l|t) is the host and the topic is …

Data Modeling Gone Wrong

I am really looking forward to the other entries – because “cleverness” never seems to know any boundaries when it comes to Database Design or Modeling, and I just know there are some crazy things going on out there. So be sure to tune in to the host summary, that will appear on Mickey’s blog in the near future.

For my own part, I will refer to the latest “bright idea” that I came across. The application in question, was based on SQL Server, and was used to track price quotes. The application on top was in use at that point in time, and enhancements were developed, business heavily depending on this being “on-line”.

I general there were three (3) types of quotes. For each type, there was a set of table in the database that was almost identical. Only a some attributes where different, depending on type, but an inconsistent naming convention still gave away the major relations between tables in the database.
This meant that the application relied on three (3) sets of almost identical tables. This could most definitely have been designed differently, but it’s not the real cluster f**k of this application, so I will not go into detail here.
In every table of the database, there was a column name with a post fix ‘key‘. There was also a column name with a post fix ‘id‘ in all of the tables. At first sight that seemed to look like two tech leads having a ball. But actually it was not. I discovered later, that the had been only one “architect” behind this application. The good thing about that, was that the problem was easy to “contain”, since that particular employee was no longer working on the project. :)

After some investigation and data profiling, I slowly honed in on the fact that data in column blablabla_key and blablabla_id wasn’t related, in any way. Nor was key in one table related to key in any other table. Neither was id. In fact, there were no foreign keys defined in any of the tables. So no ref. integrity was ensured. In theory, even if I found the right match, crappy data could actually obfuscate this fact that there really was a relation.

Further investigation led to the conclusion, that id and key where in no way inter-related. So id in one table was not related to key in another table assembling the naming convention. No, it turns out, that for every quote, the business is operating with a term validity. The id of the respective validity, one (1) of three (3), is related to a specific quote key, which in no way was reflected, in any way, in the naming convention of either tables nor columns. E.g.: QuoteKey was related to ValidityId, in each of the respective quote type schemas. But that’s not it. For each of the three types of quota, two validities had been combined into one table. In fact, additional logic was to be applied, as if one validity was not present in said table, the other should take over. Sort of a fail safe validity. This meant that keys, if not present, was to be treated differently…

Oh Come on!

Needless to say, I spent a good amount of time trying to figure out how to combine the data, so we could do some proper business intelligence on top. Conference call after conference call left me fruitless, and still to this day, I am not sure i could have asked the developers any differently. The disconnect between me and them was soo huge, too huge to identify it seems. Asking one thing and getting a seemingly sane answer just made it even more difficult. Because none proved valid when querying the database. In the end, profiling the database senselessly, made the relations stick out.

So, to wrap up. Get serious about naming conventions, event if they seem to be a pain. Secondly, and more importantly, do define the relations in the database. There are several benefits of doing that, even though you will probably meet someone who will argue this.

As a smart exit remark, I wanted to end with a quote (only fitting for this post), but unfortunately I wasn’t able to find the original source for the quote. Did some extensive searching on twitter yesterday, without any luck. To my best knowledge, this was something Adam Machanic (b|l|t) supposedly should have said. Don’t kill me, if it’s not :)

“Performance doesn’t matter, when you have Dirty Data.”

What I read of off this quote is, that if you have dirty data, no performance in the world will ever make up for that. This is where Foreign Key constraints becomes your best friend. There are actually several good articles out there on Foreign Key Constraints:

Do Foreign Key Constraints Help Performance? by Grant Fritchey (b|l|t)
Foreign Key Constraints: Friend or Frenemy? by Mickey Stuewe (b|l|t)
Do Foreign Keys Matter for Insert Speed? by Jeremiah Peschka (b|l|t)

Thanks to Mickey Stuewe for hosting such a brilliant topic. I look very much forward to read the other entries!

Posted in Community, Programming | Tagged , , | Leave a comment

Everything Microsoft BI (as of late)

Lately Microsoft seems to have stepped even more on the gas, releasing amazing features upon features for both SQL Server 2016 and Power BI as well as their Cloud platform Azure. This blog post will run through some of the features I find most interesting, in the latest couple of releases. With that said, Microsoft is realeasing new features on a weekly basis for the Power BI service, on a monthly basis for the Power BI Designer and have made three Community Tech Previews (CTP) available over summer for SQL Server 2016.
Just back from PASS Summit and I am a happy Camper, full of all the great new stuff already there, in preview or about to hit us with the vNext of SQL Server.
See it all in a foundation presentation by James Philips (t|l), Corporate VP at Microsoft.

Without further ado, here’s my list of points of interest – it may be a mess to you, but reflects my personal interest in the Microsoft tool stack at this moment.

SQL Server 2016

Strecth Database, for individual tables. See blog post by Marco Freccia, CTP2
Quick test, is this feature for me?


Multiple TempDB files. See blog post by Jeff Shurak, CTP2.4
SQL Server now defaults to 8 data files or the number of cores (read: threads), whichever is less, for the TempDB files.

AlwaysEncrypted, See blog post by TechNet, CTP3
Always Encrypted allows clients to encrypt sensitive data inside client applications and never reveal the encryption keys to the Database Engine

Native JSON Support, see blog post by Jovan Popovic, CTP3
Feature set is growing by each release of CTP, currently SQL Server can format and export data as JSON string, load JSON text in tables, extract values from JSON text, index properties in JSON text stored in columns and more to come.

Temporal Tables, see MSDN Article, CTP3
Allows you to keep a full history of data changes and allow easy point in time analysis. Temporal Tables is a new type of user table in SQL Server 2016

Row Level Security (RLS), see MSDN Article, CTP3
RLS enables you to implement restrictions on data row access. For example ensuring that workers can access only those data rows that are pertinent to their department, or restricting a customer’s data access to only the data relevant to their company.

PolyBase, see MSDN Article
PolyBase allows you to use T-SQL statements to access and query in an ad-hoc fashion data stored in Hadoop or Azure Blob Storage.

Parallel Partition Processing in Tabular, see blog post by Haidong Huang CTP2
Allows for faster processing of Tabular cubes. Mind the default settings though, read the blog post.

Single Package Deployment (Project Mode), see blog post by Andy Leonard (b|l|t)
Now you’ll be able to deploy a single package in a project, to fix a bug or similar scenario. Read the blog post, as there is no such thing, as a free lunch!


AlwaysEncrypted for Azure SQL Database, see blog post by MSDN, Preview

Read more about all Azure Preview Features here

Azure Data Lake Store
The Data Lake store provides a single repository where you can capture data of any size type and speed simply without forcing changes to your application as the data scales.

Azure Data Lake Analytics Service
The analytics service can handle jobs of any scale instantly by simply setting the dial for how much power you need. You only pay for your job when it is running making it cost-effective.

Azure IoT Hub
Connect, monitor, and control millions of IoT assets running on a broad set of operating systems and protocols.

Data Catalog
Data Catalog lets users—from analysts to data scientists to developers—register, discover, understand, and consume data sources.

SQL Data Warehouse
Azure SQL Data Warehouse is an elastic data warehouse as a service with enterprise-grade features based on the SQL Server massively parallel processing architecture.

If that’s not enough for you, check out the impressive set of Azure services here.

Power BI

I can’t keep up!! (which is probably a good thing)

Duplicate Report Page, read the Power BI weekly update blog
A common scenario when creating reports is to have multiple pages that are identical except for having different filters applied.

Collapse Navigation Pane though an URL parameter, read the Power BI weekly update blog
Allows you to add an URL parameter to your link that will automatically collapse the left navigation pane for any visitor who clicks the link.

Full Screen Mode, read the Power BI weekly update blog
Enables Full Screen Mode for Power BI dashboards and reports.

If you want to see some of the amazing things you can already do in Power BI, please visit the Power BI Best Visual Contest page

Not to mention all the features updated each month in the Power BI Desktop application, see latest updates here.

If you find something missing, Microsoft is actually listening, so please feel free to register and suggest your fantastic new idea at this site.

Posted in Programming | Tagged , , | Leave a comment

Summarizing my #Summit15

Finally I am able to, coherently write down my take away’s from this years PASS Summit. I came a long way, and endured all kinds of dangers on the way to Seattle. Not sure if the person before me, in the flight seat had been shot, but there was a distinct hole in the window, WTF Lufthansa!:

Lufthansa Hole in the Window

Gun crazed Americans? Or was it ze Germans?

As always, at least for me, the schedule changes as soon as the summit begins. This happened this time around as well. I had planned on seeing a bunch of sessions on various topics, but as game day came along, my choices changed. On Day 1, only  Dimensional Modeling Design Patterns: Beyond the Basics by Jason Horner (b|t) came out strong. Jason is an awesome presenter, so if you ever get the chance, go for it!  I knew most of the stuff in advance, but sometimes it’s nice to have reassurance of what you think is the proper road ahead. Jason’s session gave me just that. SQL Server on Azure Virtual Machines: Features, Best Practices & Roadmap by Luis Vargas was a bit disappointing as there were almost no best practices tied into the presentation. Luis ran out of time, and had to race through the final 4-5 slides of the deck, leaving no time to go into detail. Finally wrapping up day 1 was Cortana Analytics Deep Dive: Big Data Stores by Rajesh Dadhia, Matt Usher which started out as a very solid presentation. But. The so called demonstration of the Azure Data Lake import into Azure SQL Data Warehouse through Azure Data Analytics was not a demo, at least not in my opinion. Presenting a bunch of screen dumps (with errors occurring) is not a demo.
As soon as the last session ended, 40+ Danish attendees went to dine at the Pike Brewing Company. Jørgen Guldmann (b|l) captured the night out, on his blog. Good times.
Day 2 ended up being all about my Lightning Talk – which I think went well. No tomatoes, no rotten eggs, so I guess it wasn’t that bad :)
Day 3 began with some U-SQL Query Execution and Performance Tuning by Ed Triou which didn’t end too well, as the speaker decided to go 13 minutes over time, leaving only two (2) minutes for Jason Thomas (b|l|t) to get ready for Memoirs of Building a 150 GB (and Growing) SSAS Tabular Model. This session was surprisingly more on hardware than on architecture, which I somehow had my mind set upon. Nevertheless, Jason delivered, as usual, a cool, calm and collected session. Good job!


Right after, in same room came Bill Anton on Analysis Services: Show Me Where It Hurts. If you work with Analysis Services and do not yet have logging and performance counter collection in place, have a look at the session description, downloads and Bill’s blog Here’s how you get up and running quickly.
I spent the afternoon attending Mike Diehl on Agile Analytics: Making Progress Visible which, again, was one of those sessions that provided the reassurance of being on the right track. No ground breaking news, but that’s OK on many levels. Mike did a great job presenting and I think his message is very important to the community. Work smarter, not harder.
I don’t really know how to describe the last session of the day: How to Build a Virtual Test Lab for SQL Server by Ed Leighton-Dick and David Klee. The session began really strong (read: funny) with this video (which I think is both hilarious and scary at the same time)

But after that, it became a show case of who was the most geeky on hardware @ home. I realize that a geek, by now,  is socially accepted. But I honestly thought we (yes, I just labeled myself geek there) had moved on from the “I’ll show you how much geek I am by displaying the size of my soldering station” – sigh.
I was hoping to see some insight, from undisputed authority on the field, into how to setup you _virtual_ dev/test environment. We ended up hearing too much about how much old hardware spare parts would cash in on eBay. Again, don’t know what to say…

This year in Seattle, I had time to visit the Space Needle, where I played around, making this panoramic photo:
Seattle Skyline

Oh yeah, final pro tip: When dining at Local 360, ask for the price before ordering – These two babies cost me $50 and I was expecting half that…

Expensive Short Ribs

Posted in Community, Programming | Tagged , , | Leave a comment

PASS Summit 2015 – Personal Schedule

Only 20 hrs untill my flight is taking off from CPH -> FRA -> SEA – But who’s counting? :)
This year is no different from any year at PASS Summit (saying that as a 2nd year veteran). The time to build your schedule, is probably the most frustrating part leading up to the conference. This goes for SQL Bits as well, another well planned SQL Server event.

This year I opted out from the pre-con on various reasons, mainly because I was very disappointed last year, in terms of cost vs. output. Although I must say a couple of them looked really interesting, especially The Complete Primer to SQL Server Virtualization by David Klee, Argenis Fernandez, Jimmy MayOptimize “All Data” with a Modern Data Warehouse Solution by Bradley Ball, Josh Luedeman and not least The Enterprise Scripting Workshop by Sean McCown, Jennifer McCown. But the fact having three kids age Eight, Six and Two makes my trip to Seattle short, this year at least. I had also planned on attending the comeback of Big Ben in Pittsburgh hosting Cincinatti, but hopefully they can manage without me (fingers crossed).

So in short, my sessions on interest will be:




As with any schedule, this is subject to change. I certainly always find myself making “game time” decisions influenced by whom ever I am talking to, in between sessions. Often there is some aspect that others point out, that influence the final call. Sometimes I even “miss” sessions, just to socialize and extend my network – There is always someone interesting to talk to, no matter where you turn at PASS Summit.

Hope to see you there!

Posted in Community | Tagged , , | Leave a comment

Using PowerShell to Discover Temp/Log/Backup and Data Directories of SSAS

Objective: Locate all folders related to SQL Server Analysis Services (SSAS) on disclosed servers.

During a massive cleanup on some of our environments, I was asked to point out all the possible locations of SSAS related files on a list of 10+ servers. The main reason for this not being a trivial task done with blindfold is that, at some point in time, someone made a decision to change the default value in the SSAS configuration. This change lead to new cubes being deployed to a different location than the existing ones. Also contributing to the mess was a dubious naming standard, which makes guessing the folder names a difficult task.

My immediate strategy was to check if I could use the Analysis Service Management Objects (AMO) to get the current values from the configuration on each instance. I happen to be very fond of the AMO API, so this was my first instinct. Unfortunately there were several drawbacks and shortcomings to this, as only the Data directory is exposed, and I would have no way of telling the historic values of each property (TempDir, LogDir, BackupDir and DataDir).

Asking the question on Twitter (using hashtag ssashelp), whether there was some clever way of getting this information returned only a few responses, from Bill Anton (b|l|t) and Chris Webb (b|l|t) – but nothing directly solving the issue at hand. Bill shared this link to lookup Instance Configuration via PowerShell and Chris shared this one on Using MDX to browse the File System.

I think that the API should at least expose the current values for all possible configurations on a SSAS Instance. I have filed a Microsoft Connect ticket for this. But since this is not currently available, the only option was to script it myself.
The script will produce a csv file containing a list of servers with corresponding paths to all folders containing files of specified extension. In this example I have chosen to go for .abf (Analysis services Backup File), .dstore (Compressed store and is a decoding table for SSAS) and .trc (Trace file for SSAS). The script look as follows:


Obviously this script comes with the normal DON’T JUST COPY/PASTE INTO PRODUCTION disclaimer. Please read through the script and adapt it for your own use. $Path and $Servers should be set accordingly.

Lastly you need to be aware of security. Your account may not be granted the proper permissions to actually remotely crawl the files and folders – please make sure this is addressed in advance.

Note also this script will scan all logical drives on all listed server, which can take some time,

Posted in Programming | Tagged , , | Leave a comment