If you’re not familiar, T-SQL Tuesday is a blogging party hosted by a different person each month. It’s a creation of Adam Machanic (b|l|t), and it’s been going on for ages now! Basically the host selects a topic, defines the rules (those are almost always the same), and then everyone else blogs about said topic. Once the deadline is reached, the host summarizes each of the submitted posts on their site/blog.
This month I get to pick the topic, and I am going to go with:
The Essential SQL Server Tools in my stack
Besides SQL Server Management Studio and Visual Studio Data Tools we all have our own set of tools that we use for everyday chores and tasks. But how do we get to know which tools are out there, if not for other professionals telling us about them? Does it have to a fully fledged with certification and all? Certainly not! If there’s some github project out there, that is helping you be double as productive, let us know about it. You can even boast about something you’ve built yourself – if you think others will benefit from using it.
Basically I think, that by establishing awareness about what kinds of tools that are out there, new professionals will not have as steep a curve getting the pace up, as they would have had. But I suspect that even some veteran guys could have an “a-ha” moment from reading the summary.
Additionally, you can (read: should) share how you came to depend on said tool – and of course you are encouraged to give credit, where credit is due in terms of making you aware of the tool.
Another approach for this topic, is to approach it as kind of A Day in the Life of kind of blog post, as has been done before by Erin Stellato (b|l|t). Writing with the specific angle to describing how your everyday is made easier by the use of your tool stack.
The How
There’s only a few rules for T-SQL Tuesday:
Your post must be published on Tuesday April 10th 2018 between 00:00 GMT and 23:59 GMT.
Your post must contain the T-SQL Tuesday logo (see above) at the top and the image must link back to this blog post.
Trackbacks should work, but if they don’t, please put a link to your post in the comments section so I (and everyone else) can see your contribution!
My session will be evolving around Azure IoT Hub, a Raspberry Pi3 device running the latest Windows 10 Core operating system. So if you want to know more about the options for Internet of Things in Azure.
If you are in the neighborhood, you should definitely check out this single day, _free_(!), event – all about SQL Server. There are a number of great sessions, and as usual, you will find it hard to build your own schedule; Deciding which sessions to attend, is always a task of picking only one session out of several really good options for the same time slot. This event is no different.
Training Days (Thurs-Fri)
To fit in more content, this year the event has two days of full training days on Thursday and Friday. Spaces on some of the training are limited to under 30 and book early to avail of additional discounts.
Click on one of the training days below for full details
Today Yesterday I was distinguished by Microsoft, as I received an mail in which they awarded me with a Microsoft Most Valuable Professional (MVP) award. It was one of those moments that’ll stick with you for a lifetime. I am truly honored and very, very excited about the days to come in this new role. As far as community goes, nothing has changed. But with the award comes a great deal of responsibility towards the products, the product teams and Microsoft as a whole. I am really looking forward to embark that ship.
In short, my excitement for today is best characterized as (this is supposed to be a gif – but using Word as publishing tool doesn’t seem to do the trick):
It’s been a joyful ride that began back in 2013 where I gave my first public talk. Since then I have been fortunate enough to be selected to speak at a number of events such as SQLSaturday, MSBIP, Microsoft Campus Days and not least the great PASS Summit. On that note, I’d like to particularly thank Mark Broadbent (b|l|t) who twice has given me the opportunity to be part of his phenomenal SqlSat Cambridge events. Mark was also forthcoming about nominating me for this award, and for that I am very grateful. Also, a huge thank you goes out to Regis Baccaro (b|l|t) who on numerous occasions has given me the opportunity to both speak and be part of the team organizing the SqlSat Denmark events for the last four (4) years. There are a lot of people whom I am thinking of right now, writing this piece. People from all over the world, some of whom I’ve never even met – but still, they are part of what made this journey so wonderful and interesting. So, a great thank you goes out to the #sqlfamily out there, wherever you may be. I will be looking forward to reconnecting with old acquaintances as well as new ones in the time to come. I am always open for a chat, even about the Microsoft Data Platform
There are a lot of great examples out there on how to build your own custom Time Intelligence into Analysis Services (MD). Just have a look at this, this, this, this and this. All good sources for solid Time Intelligence in SSAS.
One thing they have in common though, is that they all make the assumption that there is and will always be 52 weeks in a year. The data set I am currently working with is built on ISO 8601 standard. In short, this means that there is an (re-) occurrence of a 53rd full week as opposed to only 52 in the Gregorian version which is defined by: 1 Gregorian calendar year = 52 weeks + 1 day (2 days in a leap year).
The 53rd occurs approximately every five to six years, though this is not always the case. The last couple of times we saw 53 weeks in a year was in 1995, 2000, 2006, 2012 and 2015. Next time will be in 2020. This gives you enough time to either forget about the hacks and hard-coded fixes in place to mitigate the issue OR bring your code in a good state, ready for the next time. I urge you do the latter as suggested by the work philosophy of the late Mærsk McKinney Møller: Constant Care.
The Real Issue
So why is this such a big deal? After all, it’s only a problem every say five-six years.
For starters, some built-in functions in Analysis Services will go bunkers over this sudden alienated week.
Your otherwise solid calculations will suddenly have wholes, blanks and nulls in them. The perfect line chart broken. Not to mention the Pie chart, where a perhaps crucial piece is missing!
In my line of work there have been a great deal of discussion about how to treat this troublesome Week 53. One suggestion was to just distribute the sale on Week 53 to all other weeks across the year. Every week thereby containing a fraction (1/52) more sale than usually – this way, comparison across years will even out. But what about companies that have a huge spike in sales around New Years Eve (think Liquor) – they would then not be able to compare the real sale around New Years Eve, because this would be disguised by the massive sale over the rest of the year.
Our working solution is to compare the same number of weeks as the current year you are operating with. In 2016 that’s 52 weeks, in 2015 it was 53 weeks.
The tricky part about this is to identify when to calculate what, and for this we need assistance from additional attributes in our calendar dimension.
New attributes to support this type of calculation are [Is Leap Week] and [Is Leap Year].
Is Leap Week has the value 1 whenever the current week is the 53rd week of the year. All other weeks are represented by a 0.
Is Leap Year has a value of 1 whenever the current year consists of 53 weeks. All other years are represented by the value 0. Arguably the name Leap Year could be considered confusing, as this normally means something else. Alternative names could be: Has53Weeks, HasExtraWeek or something along those lines.
Getting Set Up
You database table should look something along the lines of this:
Another table is needed for the Time Intelligence to work it’s magic – This one is for the members of the something along the lines of the Date Tool Dimension by Alberto Ferrari (b|l|t) and Marco Russo (b|l|t) which can be found here. My implementation differs a little bit, here’s how.
I have one dimension in the cube, named Date Tool. This dimension has two attributes with members accordingly. For one part I’d like to control the calculation in terms of overall scope/length of the calculation, i.e. 4 weeks aggregated or is it 8 weeks? This attribute I have named Aggregation Tool. The other element is when I want the calculation to occur, i.e. Today, Yesterday or Last Year. This attribute I have named Comparison Tool.
Members of the Aggregation Tool are: YTD, YTD LY, Last 4 Weeks, Last 8 Weeks, …, Last 52 Weeks. Members of the Comparison Tool are: Current Period (N), Last Period (N-1), Previous Period (N-2) and some the we actually don’t use.
The fact that the two attributes can be combined behind the scenes in the cube, makes this a very powerful ally.
In the Cube
In the cube we need to address the time intelligence by adding a bit of MDX script. This relies on SCOPE assignments which Chris Webb (b|l|t) has been kind enough to blog about here, present about at SqlBits VIII here, and Pragmatic Works has a video on here.
Now, reminded that we need to address the Week 53 issue and calculate a similar number of weeks to compare with for, in particular, Last Year calculations that stretch across Week 53. Let’s say 2016 Week 1 through 20, what’s the equivalent sale last year? In our case, its 2015 Week 2 through 21.
With a SCOPE statement, it’s possible to bend and twist the calculations as you see fit, and in this case, to right shift the calculation, so to speak. Here is how the calculation should look like.
[Date Tool].[Aggregation Tool].&[1] and [Date Tool].[Comparison Tool].&[1] are the default values, Current Week and Current Period respectively.
The above SCOPE statement is invoked every time the [Date Tool].[Aggregation Tool].&[18] ~ [Date Tool].[Aggregation Tool].[Year To Date Last Year] member is present in a an MDX query. So if this is active in any slicer, this piece of code will be run and the aggregation will be calculated accordingly.
Wrap Up
Before entering the domain of Retail, I would have never thought that periods could vary over time. Except maybe for my time in Government Healthcare where a 13th month was introduced, to right all the wrongs of the year. So, in other words, I guess there are many examples out there, where the good old faithful calendar simply does not cut it. In those cases, the SCOPE assignment in SSAS really does some magic for you. But beware, SCOPE assignments done wrong can get you into serious trouble, leading to all kinds of odd- or subtle miss-calculations that you don’t detect right off the back.
A final word on this approach is, that you should test every corner of your cube, before you trust any of your calculations, when dealing with SCOPE assignments. More times than I cared to count I have been fooled by a SCOPE assignment.
My latest project work related project has offered me many first time experiences. Most prominent is, that this is my first fully fledged Power BI project. I have been playing around with Power BI for some time on a household basis and back in the summer of 2014 I ran some workshops with a selection of business people in Maersk Line.
The more curio first time experience is that the company we are developing this PBI Model for a company that was founded 368 years ago. Not months or weeks; years! They’ve been around and then some.
My main technical discovery with this project was the use of try … otherwise in M. I stumbled upon the construct via Gerhard Brueckl (b|l|t) in this blog post of his on error handling.
I found usage of this construct, not for direct error handling as such, but then again in some way it is error handling.
Data comes in many shapes and forms, and this project is no exception. In order for the project not to explode in hundreds of functions and exponentially as many queries, I had to come up with a way of containing this. So for educational purposes, let’s say we get three different file types from a single source. This single source needs to be presented to the end-user as one and only one source of data. Each of the three file types has a specific structure, which needs to be handled during import. There may be hundreds of files in total, but each file confers to one and one only format.
Generating the Steps
After going through a normal GUI import of each file type in Excel or Power BI Desktop, we can copy out the M code from the advanced editor. Once we have the code it’s fairly easy to convert into a function which we can invoke at will (read: on each file in a folder).
Let’s go over this, by a quick example. Given is a file containing data, for some specific period.
We obviously want to model this into a better shape, so reporting is made easier. We do so manually in Power BI by applying different steps. Here’s the result of my steps.
How I got there is not really important for the sake of this demonstration, only that I can pick up the code behind by looking at it in the advanced editor (apologies for the Danish labels).
The code will look something like this
let
Kilde = Excel.Workbook(File.Contents("C:\Users\jve.CMS\OneDrive - CatMan Solution A S\eksempel.xlsx"), null, true),
Tabel1_Table = Kilde{[Item="Tabel1",Kind="Table"]}[Data],
#"Ændret type" = Table.TransformColumnTypes(Tabel1_Table,{{"EAN", Int64.Type}, {"Date", type date}, {"Copenhagen", Int64.Type}, {"London", Int64.Type}, {"Tokyo", Int64.Type}, {"Berlin", Int64.Type}, {"Paris", Int64.Type}}),
#"Fjernet pivotering af andre kolonner" = Table.UnpivotOtherColumns(#"Ændret type", {"EAN", "Date"}, "Attribut", "Værdi"),
#"Omdøbte kolonner" = Table.RenameColumns(#"Fjernet pivotering af andre kolonner",{{"Attribut", "Store"}, {"Værdi", "Value"}})
in
#"Omdøbte kolonner"
or as in the screen shot
Create a Function in M
Turning the above code into a function is actually pretty straight forward. We know we are to iterate a folder full of files, so we know we are going to have the location of each file. Our function will then take a file (location) as parameter.
This is done like this:
let
#"Import" = (file) =>
let
Kilde = Excel.Workbook(File.Contents(file), null, true),
Tabel1_Table = Kilde{[Item="Tabel1",Kind="Table"]}[Data],
#"Ændret type" = Table.TransformColumnTypes(Tabel1_Table,{{"EAN", Int64.Type}, {"Date", type date}, {"Copenhagen", Int64.Type}, {"London", Int64.Type}, {"Tokyo", Int64.Type}, {"Berlin", Int64.Type}, {"Paris", Int64.Type}}),
#"Fjernet pivotering af andre kolonner" = Table.UnpivotOtherColumns(#"Ændret type", {"EAN", "Date"}, "Attribut", "Værdi"),
#"Omdøbte kolonner" = Table.RenameColumns(#"Fjernet pivotering af andre kolonner",{{"Attribut", "Store"}, {"Værdi", "Value"}})
in
#"Omdøbte kolonner"
in
#"Import
Notice only a few minute changes, wrapping the step a let #”Import” = (file) => … in #”Import” is almost all it takes. We only need to apply the parameter to the steps already defined. So we swap the fixed file path with the parameter file. Now we are set to create a new query that iterates files in a folder and invokes this new function, taking the file location as parameter.
Ones this has been done for all of the different types, three in this particular example we need a way of appending the data from the different queries to one another. This is where the try … otherwise structure comes in handy.
Imagine we have a function available in the model for each of the three file types. Let’s call them fnImportFileType1, fnImportFileType2 and fnImportFileType3. A prerequisite for this to work properly is that we know for a fact that no one file will produce any meaningful result from more than one (1) of the functions. Otherwise we would have duplicate data, and while we love data, duplicating it should be done with careful consideration.
So rather than adding a column for each of the three file types, or simply generating a query for each one and then appending each of those in the end, we can accomplish this with a little cleverness. Now, some might argue that files should be sorted and only those relevant for a given function is actually iterated and parsed – but the world is not always perfect or maybe there is no way of sorting the files (in a reasonable amount of time).
So, without further ado, here is a way of appending data from multiple sources/functions into one single column which can be expanded, as two steps in M.
For each row, we invoke first fnLoadFileType1, if that fails we load fnLoadFileType2 and if that fails we invoke fnLoadFileType3, finally we return null if all loads fail to produce a result.
Conclusion
While I have no idea if this is killing performance in the long run (it doesn’t now, going through 500 files) I get that this is maybe not the perfect situation. However, I am able to produce a (valid) result for our client within a narrow time frame – which is something they really love.
I’d love any input on this, as I am learning as I go along.