Matt Gordon (b|l|t) is hosting this months TSQL2SDAY which is on the topic fixing an old issue/problem with shiny new toys. I am really happy about this topic, as it offers the opportunity to display not only the short comings of earlier versions, but also the progress made by the SQL Server team.
My contribution to this blog party, is going to be about SQL Server 2016 SP1, which is the edition of SQL Server we implemented the solution on. Before I was part of the company, there had been numerous attempts to solve what was know as the Zero Sales Issue. Working with Retailer/POS data some customers are interested in identifying if a product didn’t sell on any given combination of date, product and store, and in such case calculate some sort of lost potential for that store (on that specific date for that specific product). For some of our customers, this quickly becomes a super massive matrix, as we serve 3.500 stores on top of (for some) app. 5.000 products. The calculations were to be conducted on a two year running period (730 days). With this example we end up with a combination of 3.500 store x 5.000 products x 730 days = 12.7B rows, just for this particular costumer; We have potentially (at the moment of writing) 50+ customers.
Generating a table of this magnitude, along filling in the gaps of the days that actually have sale was previously a too time consuming task to offer this kind of analysis in our portfolio. Enter SQL Server 2016.
With SQL Server 2016 we were able to generate the table and fill in the blanks that was needed in order to do the calculation (Yay Window Functions!). After that, we are offering not only one (1) but three (3) different calculations on top of the data. Whenever a blank (a case of zero sale) is encountered, we calculate the average sales value of the same product in the same store over the last 7, 14 and 28 days. In addition to this, we also add a filtering mechanism, such that the client can focus on products that are “normally” selling on all days in the selected period. Products that are sold on rare occasions are not displaying the issue of Zero Sale, as this is supposed to identify if and when a store fails to offer the product in question. Empty shelves for a top selling product I think everyone can acknowledge is a serious issue.
The setup is sort of attached to our regular data import and is split out on a separate server of its own. We are currently migrating from on-premises to a more cloud based solution. Not sure when we will be fully in the cloud, in time I guess.
The server itself is a pretty standard setup, currently running on a Standard DS13 v2 Virtual Machine in Azure (8 cores and 56 GB memory). On top of that we’ve added a couple of striped disks in order to serve both data and sql-temp operations better. Read more on how to stripe disks on an Azure VM here.
This about covers the “hardware” specs of this setup and the rest is comprised of some in-memory tables, column store indexes as well as a fairly straight forward SSIS package.
In previous attempts (years back) the time consumed by doing the calculations was way over limit, hence the product/feature wasn’t offered to the market. This is particularly troublesome if it’s a client request. With column store indexes as the main contributor we are now able to accommodate for this type of calculation although not on the fly, in a more reasonable time frame which is all good and well when your data update is on a weekly basis.