Testing your SSAS partitioning strategy w/ PowerShell

During a presentation of mine, there was a discussion going on, about what partitioning strategy to use when partitioning cubes. Most frequently it seems people are partitioning by some form of period, e.g. by Years, Months or Weeks. Rationally it may seem like a good idea, at the time. But wouldn’t it be a more fact based approach to actually put the strategy to a test?

Preface

There are several ways of approaching cube, or more precisely, measure group  partitioning. One approach is to simply define the partitions in the Visual Studio Data Tools [SSDT] project and then deploy/process the database for testing. This process is cumbersome and very manual, which in the end, very likely, can lead to erroneous conclusions.
On codeplex, there is a project called ASStoredProcedure which is a collection of Stored Procedures (compiled C# code). By installing the assembly on a given SSAS instance, you get access to use all kinds of cool methods.

For more information on custom assemblies in Analysis Services, read this and this.

In the resulting assembly is a method called CreatePartitions(…)
CreatePartitions(CubeName, MeasureGroupName, SetString, [PartitionGrouperExpressionString])

The outcome of calling this method is a measure group that is partitioned by the members present in the designated set. In the following example the measure group Internet Sales is going to be partitioned by Year.

Code Example

call ASSP.CreatePartitions("Adventure Works", "Internet Sales", "[Date].[Calendar Year].[Calendar Year].Members")

It’s important to keep some of the obvious draw backs of this approach in mind.
First of all, you need to have a fully processed cube, in order to execute the method. Having only the measure group and the dimension in question processed is not enough. This means, that if the cube is considerable in size, the pain of processing will be double during the creation of the partitions. Initially process in order to create the partitions, then process the partitions – you can elect to process the cube the second time by Default rather than Full, which will detect changes in the Measure Groups in question and only process those.

Second, if data during a future process, exceeds the given boundaries of the partitions (specified in the slice attribute – which the AS Stored Procedure will set at the time of creation) processing will break. One example could be that the underlying data source provides a rolling window, and if a specified partition has no data, processing will break. This behavior was a bit surprising to me. So you need to re-create partitions on a regular basis. As often as your “grain” on the partitions allow for spill. This is particularly hurtful when/if the whole cube needs re-processing each and every time data is updated.

Semi-Automation

Obviously everyone can execute MDX of the back in SSMS. But the trick is to use PowerShell to do the hard work for you. With a few lines of code, we can easily set up the whole test – you would want to test all your theses.

First we create two helper functions, to assist with the repetitiveness.

Import-module "sqlps" –disablenamechecking
Import-module "sqlascmdlets"

cls;

function ExecuteTestMDX() {
	$result = @()
        #Single MDX Statement
	$Query = [System.IO.File]::ReadAllText("C:\MDXs\Test.mdx")
	
	Invoke-ASCmd -Server "<SERVERNAME>" -Database "<DATABASENAME>" -Query "<ClearCache xmlns=`"http://schemas.microsoft.com/analysisservices/2003/engine`"><Object><DatabaseID><DATABASENAME></DatabaseID></Object></ClearCache>" | Out-Null
	#Write-Host "Executing MDX`r"
	if( $query -ne $unll ) {
                #Querying 10 times
		for( $i = 0; $i -lt 10; $i++ ) {
			$BeginTime = Get-Date
			Invoke-ASCmd -Server "<SERVERNAME>" -Database "<DATABASENAME>" -Query $Query | Out-Null
			$EndTime = Get-Date
			Write-Host "." -NoNewline
			$result +=  ( $EndTime - $BeginTime ).TotalMilliseconds 
		}
	}
	Write-Host "`r"
	return $result
}

function CreatePartition( [string]$Set ) {
    #Write-Host $Server $Database $Cube $Set
	Invoke-ASCmd -Server "<SERVERNAME>" -Database "<DATABASENAME>" -Query "call ASSP.CreatePartitions(`"<CUBENAME>`", `"<MEASUREGROUPNAME>`", `"$Set`")" | Out-Null
	Write-Host "`r`nPartition Created"
	Invoke-ProcessCube -Server "<SERVERNAME>" -Database "<DATABASENAME>" -Name "<CUBENAME>" -ProcessType "ProcessDefault" | Out-Null
	Write-Host "Cube Processed"
	return $Set
}

Remember to replace <SERVERNAME>, <DATABASENAME> (don’t forget the clear cache XMLA statement), <CUBENAME> and <MEASUREGROUPNAME> with your own values. Also adjust the path for the MDX you want to have run against the cubes. I tried to parametrize the method but for some reason the function failed to convert the input to

When this is in place, we can loop the sets we want our measure group to be partitioned by. In my example, I create the following partition schemes: Year per Datasource, Datasource, Year, Week, Store and finally Chain. This is where your knowledge about both your data demographics and end users query statistics come into play. Remember, partitioning can improve both query and processing.

$Results = @()
$Sets = @("[Time].[Year Week Date].[Year] * [DataSource].[Datasource].[Datasource]", "[DataSource].[Datasource].[Datasource]", "[Time].[Year Week Date].[Year]", "[Time].[Year Week Date].[Week]", "[Store].[Store].[Store]", "[Store].[Chain].[Chain]" )

foreach( $Set in $Sets ) {
    Try {
        $Results += CreatePartition( $Set ) | Out-Null
        $Results += ExecuteTestMDX | Out-Null
    } Catch {
        Write-Host "Failed to create or query: " + $Set
    }
} 

$Results | Out-File -FilePath C:\SSAS\Output\Results.txt

Depending on the size of your data, this can be quite a time-consuming endeavour, which is why automating it makes a lot of sense.
Here’s how the IDE displays the progress

Cube Processing ProgressOne after another, the cubes are processed. The Write-Host statements are covered by the graphics.

I left out one thing from this example; to test processing durations. This is, however an easy fix, as we can record the time at process begin as well as the time at process end, subtract the two and get the duration. This way, we can check how processing performance  is affected by the different partition schemes.

To view the rudimentary outcome, have a look in the file configured in the script. Timings will be listed as a long list of durations, like this:

 

Results

With this data, you can put your scheme to the test; Find out which partition scheme, if any at all, works best for your particular cube.

Loading

Using PowerShell to get current Data/Log/Temp and Backup Directories from SSAS

In a blog post from October 2015 I ranted about not being able to get the Data, Log, Temp and Backup directories from the AMO API. Back then, the I was tasked with mapping all of the possible locations for SSAS related files on a number of servers.
Obviously, a single attribute per location wouldn’t cut it, as locations may have been changed over time. At least that is what I observed on this setup. So, back then, I needed something more sophisticated.
Meanwhile I thought that this was a short coming of the AMO API, I filed a Connect item, in order for Microsoft to rectify this.

Just recently a reply was made to the Connect item, highlighting the fact, that the current values of the Data/Log/Temp and Backup Directories – meaning the currently configured values – is exposed through the Server.ServerProperties collection. According to the answer, only public property values are exposed.

Using PowerShell, we can now retrieve the desired information from any given instance of Analysis Services. Doing so would look something like this:

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.AnalysisServices")

$serverName = "<insert server name here>";
$Server = New-Object Microsoft.AnalysisServices.Server;
$Server.Connect($serverName);
$prop_values = "";


$Server.ServerProperties["DataDir"].Value
$Server.ServerProperties["TempDir"].Value
$Server.ServerProperties["BackupDir"].Value
$Server.ServerProperties["LogDir"].Value

$Server.Disconnect();

$prop_values;

 

Loading

Using PowerShell to Discover Temp/Log/Backup and Data Directories of SSAS

Objective: Locate all folders related to SQL Server Analysis Services (SSAS) on disclosed servers.

During a massive cleanup on some of our environments, I was asked to point out all the possible locations of SSAS related files on a list of 10+ servers. The main reason for this not being a trivial task done with blindfold is that, at some point in time, someone made a decision to change the default value in the SSAS configuration. This change lead to new cubes being deployed to a different location than the existing ones. Also contributing to the mess was a dubious naming standard, which makes guessing the folder names a difficult task.

My immediate strategy was to check if I could use the Analysis Service Management Objects (AMO) to get the current values from the configuration on each instance. I happen to be very fond of the AMO API, so this was my first instinct. Unfortunately there were several drawbacks and shortcomings to this, as only the Data directory is exposed, and I would have no way of telling the historic values of each property (TempDir, LogDir, BackupDir and DataDir).

Asking the question on Twitter (using hashtag ssashelp), whether there was some clever way of getting this information returned only a few responses, from Bill Anton (b|l|t) and Chris Webb (b|l|t) – but nothing directly solving the issue at hand. Bill shared this link to lookup Instance Configuration via PowerShell and Chris shared this one on Using MDX to browse the File System.

I think that the API should at least expose the current values for all possible configurations on a SSAS Instance. I have filed a Microsoft Connect ticket for this. But since this is not currently available, the only option was to script it myself.
The script will produce a csv file containing a list of servers with corresponding paths to all folders containing files of specified extension. In this example I have chosen to go for .abf (Analysis services Backup File), .dstore (Compressed store and is a decoding table for SSAS) and .trc (Trace file for SSAS). The script look as follows:

cls;

# Setting up Parameters
$Path = "<Insert Valid Path and Filename w/.csv extension here";
$Include = '*.abf,*.trc,*.dstore'
$Servers= @( "<Insert Comma Seperated List of Server here>" )
$Output = @();

# Looping all servers listed
foreach ($Server in $Servers) {
    # Getting the logical disks on server
    $Disks = gwmi -ComputerName $Server Win32_LogicalDisk | where {$_.DriveType -eq '3'} 
    ForEach( $Disk in $Disks ) {                    
        # Getting all folders that contains included file extensions
        $ChildItems = Get-ChildItem ($Disk.Name + "\") -Recurse -ErrorAction SilentlyContinue -Include $Include
        # For each file, create an entry
        ForEach( $Item in $ChildItems ) { 
            $Temp = New-Object System.Object
            $Temp | Add-Member -MemberType NoteProperty -Name "Server" -Value $Server
            $Temp | Add-Member -MemberType NoteProperty -Name "Path" -Value $Item.Directory                        
            # Add Entry to list
            $Output += $Temp
        }         
    }
}

# Remove duplicates and select only Path member
$Result = $Output | Select-Object -un -Property Path

# Write to CSV file
$Result |out-file -filepath $Path;

 

Obviously this script comes with the normal DON’T JUST COPY/PASTE INTO PRODUCTION disclaimer. Please read through the script and adapt it for your own use. $Path and $Servers should be set accordingly.

Lastly you need to be aware of security. Your account may not be granted the proper permissions to actually remotely crawl the files and folders – please make sure this is addressed in advance.

Note also this script will scan all logical drives on all listed server, which can take some time,

Loading

Extracting SSAS Database(s) XMLA w/ PowerShell

For some time, we have been trying to get Informatica’s Metadata Manager to extract information from our SSIS, SSRS and SSAS instances in production. However, we were facing issues with credentials and service account clashes that lead us to just extract the Metadata and deploy it on a server where we had full control.

The easist wasy of doing this, was to create a PowerShell script that extracts, saves and deploys the XMLA from the databases in question. So we would just be looping through a database collection on a number of servers, in order to get the required information.

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.AnalysisServices") >$NULL

$SourceServers = @( "<SOURCE SERVER LIST HERE>" ); #Source
$TargetServer = "<TARGET SERVER HERE>"; #Target

cls;

$TargetServer.Connect();

ForEach( $srv in $SourceServers ) {
    $server = New-Object Microsoft.AnalysisServices.Server
    $server.connect($srv)
    ForEach(  $database in $server.Databases ) {
        
        $stringbuilder = new-Object System.Text.StringBuilder
        $stringwriter = new-Object System.IO.StringWriter($stringbuilder)
        $xmlOut = New-Object System.Xml.XmlTextWriter($stringwriter)
        $xmlOut.Formatting = [System.Xml.Formatting]::Indented

        $ScriptObject = New-Object Microsoft.AnalysisServices.Scripter

        $MSASObject=[Microsoft.AnalysisServices.MajorObject[]] @($database)

        $ScriptObject.ScriptCreate($MSASObject,$xmlOut,$false)

        $ServerPath = $server -replace "\\", "_"

        $Path = "<INSERT DUMP LOCATION>" + ".xmla";        

        $stringbuilder.ToString() |out-file -filepath $Path 
        
        #$TargetServer.Execute(@StringBuilder.ToString());
    }    
}

Basically there are a few steps you need to do, in order to fit this script into your environment. Mind the list of sources, multiple servers can be added, although this has not been tested for name clashing on target box.

If you don’t need a physical file, for the archive, this step can obviously be skipped.

Finally, you may have to clear the database collection on the Target server, before running the script. This was done as a one off. This step is also diabled, commented out, in the script above. Less distruction on account of copy/paste.

Obviously if the cubes needs to be processed at the Target location, this needs to be added as well. We were just after the Metadata, hence no processing was needed.

Loading

Did someone restart the server over night?

Sometimes, when you are running lenghty integration jobs over night, you get in the office the next morning, and may be surprised that nothing has completed. And actually nothing is running anymore.

Today was such a Day for me. Last night I started the execution of a package via remote desktop and DTSExec Util. This morning, my job was gone from the session, and no where to be found in the Task Manager. So, did someone restart the server over night? (could even be a stupid auto Windows update setting rebooting the machine)

I found this piece of Powershell to tell, if the Event Log Service was stopped:

PS C:\get-eventlog system | where-object {$_.eventid -eq 6006} | select -first 10

At least it tells you quickly, if the suspicion has something to it.

Loading