TSQL Tuesday #93 – My Interviewing Experiences

Kendra Little (b|l|t) is hosting this months TSQL2SDAY which is on the topic of:

TSQL Tuesday #93: Interviewing Patterns & Anti-Patterns

What advice do you have for people preparing for or going through an interview?

I got of to a late start, and I apologize in advance for what may be a bit messy blog post.

Background

A couple of years ago, when I was working at a pretty large blueish container shipping company, we got the opportunity to interview candidates for our off-shore team. This was, as far as I know, a first off in the company, and only came to be because my manager at the time was adamant about it.

We quickly decided, that I was to take on the technical part of the interview and my manager the more person/team oriented aspects. So I devised a good number of questions, so I could accommodate for any specialization the candidate may have. We had to cover a lot because our landscape was, well, let’s just put it: Not so single vendor-ish.
To make a long story short, the setup in the company at that point in time was that the Data Warehouse (DWH) was situated on a Teradata box. On top of that there were several options for business reporting. One such option was Microsoft Reporting Services (SSRS), which was done in the department where I was sitting. The reports were however not sourcing directly from Teradata, but from one of many Analysis Services (SSAS) cubes we hosted as well. All data was sourced using Informatica (into the DWH), in yet another department, so Integration Services (SSIS) was officially (and unofficially) not allowed as sourcing tool, which is why no questions reflect that aspect of the Microsoft BI Stack.

So, we were looking for strong profiles in SSRS and SSAS as well as having Teradata knowledge. Along with a general understanding of Data Warehouse fundamentals of course. In other words, not junior profiles.

Preparation

In order to lock in on the level of the one being interviewed I prepared a set of questions with increasing technical difficulty. Since my knowledge on Teradata was limited I asked around among the DBA’s and decided after some contemplating to completely avoid checking for technical Teradata skills. We would have to train them, if need be.
The list of questions evolved during the whole process. I honed in on the  general level of applicants, after firing questions East and West. Me being more precise gave everyone a much better experience. Conversations tend to stop, when no-one understands you and, you don’t look like a Donkey asking about simple stuff.
So hopefully you get the opportunity to tune into the level of the applicant before you unload your tech quiz.

The Interviews

I wound up having three categories of questions:

Proficiency

Do they have any strong sides? Usually people tend to have a favorite tech in which they excel. So for us it was important not to hire only SSRS Wizards/Ninjas or whatever it’s called nowadays. We wanted a fair mix.

I would ask the interviewee to rate them selves in the different tools and languages and use their rating as pointer to the question I would ask on the technical stuff. If they rated themselves > 7/10 in MDX and are having trouble naming at least five (5) frequently used functions, you know something is rotten.

Something I notoriously have gotten wrong would be:

  • Q: Explain what you get from having Enable Visual Totals checked?
  • A: When Enable Visual Totals are checked, the Grand Total reflects the total you are allowed to see. The Grand Total is not a Grand Total, but your Local Total. At least that works in my mind!
Data Modeling

Data modeling was usually the area where the interviewee’s were least knowledgeable. So the nature of the questions in this category was on a very high level.

  • Q: What’s the difference between a data mart and a data warehouse?
  • A: A data mart is a subset of the data warehouse.
  • Q: Do you know any architectural principles of data warehousing
  • A: Inmon & Kimble
  • Q: Delivery Date and Shipping Date in a Fact table, how would you model that?
  • A: I would use a role playing dimension, Time.
Optimization

In this category of questions, I’d ask them to explain how they would optimize say a poorly performing dimension in SSAS.
What about query performance tuning – what would they look for and how would they attack the issue?

Final Question

But, the one question I always looked forward to was:

Tell me about a Business issue you solved, which you are particularly proud of.

This question I grew fond of, because it allows for the interviewee to brag about how cool they were doing this and that awesome thing to save the business a Centillion Dollars. This allows the person really to shine and at the same time, you get a feeling for their level of engagement and technical flair, not their skill set as such. Don’t you want to know how ingenious they can be a their best? This will give you a hint in that direction, or at least it did for me.

Tests

Back in the days before the big blue container hauling company, I was also involved in the hiring process. I was in fact tested myself, when I first began working there and in my experience tests can have two immediate take-away’s:

  1. It was all talk, and no walk.
  2. Does the person think and act like you expect them to?

1)

I’ve had a number of people skipping the test after we had a good 1-1½ hour talk. Excuses flying around like butterflies in the summer.

“Oh, I forgot! It’s my Grandmother’s birthday, I have to leave urgently. Bye Bye!”

“My car is double parked… I need to move it immediately!”

“Is that the time?”

– Never to be heard from again.

For this kind of situation, a test is very handy – apparently there are a fair amount of bul$hitters out there and fortunately the test prevented you from hiring them.

2)

In my mind, a test should display how the person got to the result, rather than just showing a number. I want to see the T-SQL code, the MDX, M or whatever they used to solve the puzzle. This tells me way more than a result and time.

With the actual code in hand, you can go into discussion (something a lot of coders dislike) about how they came up with what they submitted as final answer. By discussing the code, you get a sense of how their personality is when dealing with their own code. Expect a lot of back-paddling, as time, room temperature, humidity and other factors can have a high impact on the delicate nature of some coders 😉

Summary

There are a lot of ways to conduct an Interview, and I bet my humble contribution to this tsql2sday is one of the more novice ones, but I hope I have allowed you to see some of the thought processes I’ve had, when having to interview people for a position. I am very much looking forward to seeing the other contributions to this blog party – can hardly wait.

Thanks Kendra for hosting!

SET Pressure ON;

This fall is going to be my most busy so far, in terms of speaking. I have been accepted for SqlSaturday in Oslo where I am going to be talking about BI DevOps, which I am really stoked about. Supposedly our Northern neighbors are to host one h… of a SqlSaturday. I hosted a session about this at the recent SqlNexus event, and it seems quite popular. This not only by attendance, but also the fact that Mark Broadbent (b|l|t) and I have agreed that I do a mini-con on the same topic at SqlSaturday Cambridge. See here for Part 1 and here for Part 2.

Please use this link to register for the mini-con: BI DevOps Mini-Con
(All proceeds will go to charity)

I am thrilled to go back to Cambridge again this year. It seems I am there odd years only, 2013, 2015 and now 2017. This year, besides the mini-con, I am on for a session on Pro Techniques for the SSASMD Developer. A session I have delivered on several occasions, latest at SqlSaturday in Prague.

Update:
On a more local scale, I am hosting a Power BI workshop in Maribo on Lolland August 23rd. More information via the Meetup invite here.

But, before these three events, I am to give a session at the Power BI World Tour, which is held in Copenhagen. More precisely at Microsoft HQ in Kgs. Lyngby, just north of Copenhagen. Here I will be demonstrating how to connect a Raspberry Pi running Windows 10 IoT Core to Azure and read of the device live in Power BI.

Topping off the quarter, I will be in Seattle, where I am doing not only one, but two sessions; on session on BI DevOps (again) and one on <drum roll please> how to connect a Raspberry Pi with Azure Stream Analytics and Power BI. Really, REALLY, excited to be selected to speak at the PASS Summit. I have only done a Lightning Talk, back in 2015 , and I am both humbled and scared to have two(!) general session slots at the greatest data platform event in the world. I have however had a few test runs before the event, so that should hopefully have settled the worst of the nerves.

This year I am proud to state, that I am 2/2 on submissions to the Summit.

I hope to be going back to the U.S. in December, to be able to speak in Washington at the SqlSaturday event there. If I succeed in getting accepted there, I have a cunning plan on going home via Pittsburgh to catch the Ravens taking a beating. Fingers X’ed.

 

I was awarded the Microsoft Data Platform MVP

Today Yesterday I was distinguished by Microsoft, as I received an mail in which they awarded me with a Microsoft Most Valuable Professional (MVP) award. It was one of those moments that’ll stick with you for a lifetime. I am truly honored and very, very excited about the days to come in this new role. As far as community goes, nothing has changed. But with the award comes a great deal of responsibility towards the products, the product teams and Microsoft as a whole. I am really looking forward to embark that ship.

In short, my excitement for today is best characterized as (this is supposed to be a gif – but using Word as publishing tool doesn’t seem to do the trick):

It’s been a joyful ride that began back in 2013 where I gave my first public talk. Since then I have been fortunate enough to be selected to speak at a number of events such as SQLSaturday, MSBIP, Microsoft Campus Days and not least the great PASS Summit. On that note, I’d like to particularly thank Mark Broadbent (b|l|t) who twice has given me the opportunity to be part of his phenomenal SqlSat Cambridge events. Mark was also forthcoming about nominating me for this award, and for that I am very grateful. Also, a huge thank you goes out to Regis Baccaro (b|l|t) who on numerous occasions has given me the opportunity to both speak and be part of the team organizing the SqlSat Denmark events for the last four (4) years. There are a lot of people whom I am thinking of right now, writing this piece. People from all over the world, some of whom I’ve never even met – but still, they are part of what made this journey so wonderful and interesting. So, a great thank you goes out to the #sqlfamily out there, wherever you may be. I will be looking forward to reconnecting with old acquaintances as well as new ones in the time to come. I am always open for a chat, even about the Microsoft Data Platform

Thank You #sqlfamily

 

SSAS(MD) Time Intelligence w/ a Twist

Introduction

There are a lot of great examples out there on how to build your own custom Time Intelligence into Analysis Services (MD). Just have a look at this, this, this, this and this. All good sources for solid Time Intelligence in SSAS.
One thing they have in common though, is that they all make the assumption that there is and will always be 52 weeks in a year. The data set I am currently working with is built on ISO 8601 standard. In short, this means that there is an (re-) occurrence of a 53rd full week as opposed to only 52 in the Gregorian version which is defined by: 1 Gregorian calendar year = 52 weeks + 1 day (2 days in a leap year).

The 53rd occurs approximately every five to six years, though this is not always the case. The last couple of times  we saw 53 weeks in a year was in 1995, 2000, 2006, 2012 and 2015. Next time will be in 2020. This gives you enough time to either forget about the hacks and hard-coded fixes in place to mitigate the issue OR bring your code in a good state, ready for the next time. I urge you do the latter as suggested by the work philosophy of the late Mærsk McKinney Møller: Constant Care.

The Real Issue

So why is this such a big deal? After all, it’s only a problem every say five-six years.

For starters, some built-in functions in Analysis Services will go bunkers over this sudden alienated week.

What is the ParallelPeriod of Week 53?
It doesn’t exist!

or

Which weeks are Cousins to Week 53?
None!

Your otherwise solid calculations will suddenly have wholes, blanks and nulls in them. The perfect line chart broken. Not to mention the Pie chart, where a perhaps crucial piece is missing!

In my line of work there have been a great deal of discussion about how to treat this troublesome Week 53. One suggestion was to just distribute the sale on Week 53 to all other weeks across the year. Every week thereby containing a fraction (1/52) more sale than usually – this way, comparison across years will even out. But what about companies that have a huge spike in sales around New Years Eve (think Liquor) – they would then not be able to compare the real sale around New Years Eve, because this would be disguised by the massive sale over the rest of the year.

Our working solution is to compare the same number of weeks as the current year you are operating with. In 2016 that’s 52 weeks, in 2015 it was 53 weeks.
The tricky part about this is to identify when to calculate what, and for this we need assistance from additional attributes in our calendar dimension.
New attributes to support this type of calculation are [Is Leap Week] and [Is Leap Year].

Is Leap Week has the value 1 whenever the current week is the 53rd week of the year. All other weeks are represented by a 0.

Is Leap Year has a value of 1 whenever the current year consists of 53 weeks. All other years are represented by the value 0. Arguably the name Leap Year could be considered confusing, as this normally means something else. Alternative names could be: Has53Weeks, HasExtraWeek or something along those lines.

Getting Set Up

You database table should look something along the lines of this:

Another table is needed for the Time Intelligence to work it’s magic – This one is for the members of the something along the lines of the Date Tool Dimension by Alberto Ferrari (b|l|t) and Marco Russo (b|l|t) which can be found here. My implementation differs a little bit, here’s how.

I have one dimension in the cube, named Date Tool. This dimension has two attributes with members accordingly. For one part I’d like to control the calculation in terms of overall scope/length of the calculation, i.e. 4 weeks aggregated or is it 8 weeks? This attribute I have named Aggregation Tool. The other element is when I want the calculation to occur, i.e. Today, Yesterday or Last Year. This attribute I have named Comparison Tool.

Members of the Aggregation Tool are: YTD, YTD LY, Last 4 Weeks, Last 8 Weeks, …, Last 52 Weeks.
Members of the Comparison Tool are: Current Period (N), Last Period (N-1), Previous Period (N-2) and some the we actually don’t use.

The fact that the two attributes can be combined behind the scenes in the cube, makes this a very powerful ally.

In the Cube

In the cube we need to address the time intelligence by adding a bit of MDX script. This relies on SCOPE assignments which Chris Webb (b|l|t) has been kind enough to blog about here, present about at SqlBits VIII here, and Pragmatic Works has a video on here.

Now, reminded that we need to address the Week 53 issue and calculate a similar number of weeks to compare with for, in particular, Last Year calculations that stretch across Week 53. Let’s say 2016 Week 1 through 20, what’s the equivalent sale last year? In our case, its 2015 Week 2 through 21.

With a SCOPE statement, it’s possible to bend and twist the calculations as you see fit, and in this case, to right shift the calculation, so to speak. Here is how the calculation should look like.

Note: Id 18 => Year To Date Last Year
SCOPE( [DateTool].[Aggregation Tool].&[18] );
    SCOPE( DESCENDANTS( [Time].[Year Week Date].[All] ) );
        SCOPE([Time].[Year Week Date].[Date]);
            THIS = SUM( YTD( [Time].[Year Week Date].CurrentMember.Lag(364) ), ( [DateTool].[Aggregation Tool].&[1], [DateTool].[Comparison Tool].&[1] ) )                        
                        - IIF( [Time].[Year Week Date].CurrentMember.Parent.Parent.PrevMember.Properties( "Is Leap Year" )
                            , ( [Time].[Year Week Date].CurrentMember.Parent.Parent.PrevMember.FirstChild, [DateTool].[Aggregation Tool].&[1], [DateTool].[Comparison Tool].&[1] )
                            , NULL
                        )
                    ;
        END SCOPE;
        SCOPE([Time].[Year Week Date].[Week]);
            THIS = SUM( YTD( [Time].[Year Week Date].CurrentMember.Lag(52) ), ( [DateTool].[Aggregation Tool].&[1], [DateTool].[Comparison Tool].&[1] ) )
                        + IIF( [Time].[Year Week Date].CurrentMember.Properties("Is Leap Week")
                            , SUM( {[Time].[Year Week Date].CurrentMember.Parent}, ( [DateTool].[Aggregation Tool].&[1], [DateTool].[Comparison Tool].&[10] ) )
                                + ( [Time].[Year Week Date].CurrentMember.Parent.FirstChild, [DateTool].[Aggregation Tool].&[1], [DateTool].[Comparison Tool].&[1] )
                            , NULL                        
                        )
                        - IIF( [Time].[Year Week Date].CurrentMember.Parent.PrevMember.Properties( "Is Leap Year" )
                            , ( [Time].[Year Week Date].CurrentMember.Parent.PrevMember.FirstChild, [DateTool].[Aggregation Tool].&[1], [DateTool].[Comparison Tool].&[1] )
                            , NULL
                        ) 
                                                 
                    ;                
        END SCOPE;       
    END SCOPE;
END SCOPE;
[Date Tool].[Aggregation Tool].&[1] and [Date Tool].[Comparison Tool].&[1] are the default values, Current Week and Current Period respectively.

The above SCOPE statement is invoked every time the [Date Tool].[Aggregation Tool].&[18] ~ [Date Tool].[Aggregation Tool].[Year To Date Last Year] member is present in a an MDX query. So if this is active in any slicer, this piece of code will be run and the aggregation will be calculated accordingly.

Wrap Up

Before entering the domain of Retail, I would have never thought that periods could vary over time. Except maybe for my time in Government Healthcare where a 13th month was introduced, to right all the wrongs of the year. So, in other words, I guess there are many examples out there, where the good old faithful calendar simply does not cut it. In those cases, the SCOPE assignment in SSAS really does some magic for you. But beware, SCOPE assignments done wrong can get you into serious trouble, leading to all kinds of odd- or subtle miss-calculations that you don’t detect right off the back.

A final word on this approach is, that you should test every corner of your cube, before you trust any of your calculations, when dealing with SCOPE assignments. More times than I cared to count I have been fooled by a SCOPE assignment.

 

T-SQL Tuesday #87 – Fixing Old Problems with Shiny New Toys

Matt Gordon (b|l|t) is hosting this months TSQL2SDAY which is on the topic fixing an old issue/problem with shiny new toys. I am really happy about this topic, as it offers the opportunity to display not only the short comings of earlier versions, but also the progress made by the SQL Server team.

Story Line

My contribution to this blog party, is going to be about SQL Server 2016 SP1, which is the edition of SQL Server we implemented the solution on. Before I was part of the company, there had been numerous attempts to solve what was know as the Zero Sales Issue. Working with Retailer/POS data some customers are interested in identifying if a product didn’t sell on any given combination of date, product and store, and in such case calculate some sort of lost potential for that store (on that specific date for that specific product). For some of our customers, this quickly becomes a super massive matrix, as we serve 3.500 stores on top of (for some) app. 5.000 products. The calculations were to be conducted on a two year running period (730 days). With this example we end up with a combination of 3.500 store x 5.000 products x 730 days = 12.7B rows, just for this particular costumer; We have potentially (at the moment of writing) 50+ customers.
Generating a table of this magnitude, along filling in the gaps of the days that actually have sale was previously a too time consuming task to offer this kind of analysis in our portfolio. Enter SQL Server 2016.

With SQL Server 2016 we were able to generate the table and fill in the blanks that was needed in order to do the calculation (Yay Window Functions!). After that, we are offering not only one (1) but three (3) different calculations on top of the data. Whenever a blank (a case of zero sale) is encountered, we calculate the average sales value of the same product in the same store over the last 7, 14 and 28 days. In addition to this, we also add a filtering mechanism, such that the client can focus on products that are “normally” selling on all days in the selected period. Products that are sold on rare occasions are not displaying the issue of Zero Sale, as this is supposed to identify if and when a store fails to offer the product in question. Empty shelves for a top selling product I think everyone can acknowledge is a serious issue.

Tech specs

The setup is sort of attached to our regular data import and is split out on a separate server of its own. We are currently migrating from on-premises to a more cloud based solution. Not sure when we will be fully in the cloud, in time I guess.

The server itself is a pretty standard setup, currently running on a Standard DS13 v2 Virtual Machine in Azure (8 cores and 56 GB memory). On top of that we’ve added a couple of striped disks in order to serve both data and sql-temp operations better. Read more on how to stripe disks on an Azure VM here.

This about covers the “hardware” specs of this setup and the rest is comprised of some in-memory tables, column store indexes as well as a fairly straight forward SSIS package.

Conclusion

In previous attempts (years back) the time consumed by doing the calculations was way over limit, hence the product/feature wasn’t offered to the market. This is particularly troublesome if it’s a client request. With column store indexes as the main contributor we are now able to accommodate for this type of calculation although not on the fly, in a more reasonable time frame which is all good and well when your data update is on a weekly basis.