Testing your SSAS partitioning strategy w/ PowerShell

During a presentation of mine, there was a discussion going on, about what partitioning strategy to use when partitioning cubes. Most frequently it seems people are partitioning by some form of period, e.g. by Years, Months or Weeks. Rationally it may seem like a good idea, at the time. But wouldn’t it be a more fact based approach to actually put the strategy to a test?

Preface

There are several ways of approaching cube, or more precisely, measure group  partitioning. One approach is to simply define the partitions in the Visual Studio Data Tools [SSDT] project and then deploy/process the database for testing. This process is cumbersome and very manual, which in the end, very likely, can lead to erroneous conclusions.
On codeplex, there is a project called ASStoredProcedure which is a collection of Stored Procedures (compiled C# code). By installing the assembly on a given SSAS instance, you get access to use all kinds of cool methods.

For more information on custom assemblies in Analysis Services, read this and this.

In the resulting assembly is a method called CreatePartitions(…)
CreatePartitions(CubeName, MeasureGroupName, SetString, [PartitionGrouperExpressionString])

The outcome of calling this method is a measure group that is partitioned by the members present in the designated set. In the following example the measure group Internet Sales is going to be partitioned by Year.

Code Example

It’s important to keep some of the obvious draw backs of this approach in mind.
First of all, you need to have a fully processed cube, in order to execute the method. Having only the measure group and the dimension in question processed is not enough. This means, that if the cube is considerable in size, the pain of processing will be double during the creation of the partitions. Initially process in order to create the partitions, then process the partitions – you can elect to process the cube the second time by Default rather than Full, which will detect changes in the Measure Groups in question and only process those.

Second, if data during a future process, exceeds the given boundaries of the partitions (specified in the slice attribute – which the AS Stored Procedure will set at the time of creation) processing will break. One example could be that the underlying data source provides a rolling window, and if a specified partition has no data, processing will break. This behavior was a bit surprising to me. So you need to re-create partitions on a regular basis. As often as your “grain” on the partitions allow for spill. This is particularly hurtful when/if the whole cube needs re-processing each and every time data is updated.

Semi-Automation

Obviously everyone can execute MDX of the back in SSMS. But the trick is to use PowerShell to do the hard work for you. With a few lines of code, we can easily set up the whole test – you would want to test all your theses.

First we create two helper functions, to assist with the repetitiveness.

Remember to replace <SERVERNAME>, <DATABASENAME> (don’t forget the clear cache XMLA statement), <CUBENAME> and <MEASUREGROUPNAME> with your own values. Also adjust the path for the MDX you want to have run against the cubes. I tried to parametrize the method but for some reason the function failed to convert the input to

When this is in place, we can loop the sets we want our measure group to be partitioned by. In my example, I create the following partition schemes: Year per Datasource, Datasource, Year, Week, Store and finally Chain. This is where your knowledge about both your data demographics and end users query statistics come into play. Remember, partitioning can improve both query and processing.

Depending on the size of your data, this can be quite a time-consuming endeavour, which is why automating it makes a lot of sense.
Here’s how the IDE displays the progress

Cube Processing ProgressOne after another, the cubes are processed. The Write-Host statements are covered by the graphics.

I left out one thing from this example; to test processing durations. This is, however an easy fix, as we can record the time at process begin as well as the time at process end, subtract the two and get the duration. This way, we can check how processing performance  is affected by the different partition schemes.

To view the rudimentary outcome, have a look in the file configured in the script. Timings will be listed as a long list of durations, like this:

 

Results

With this data, you can put your scheme to the test; Find out which partition scheme, if any at all, works best for your particular cube.

Leave a Reply