SSIS Script Task and Proxy Auto Configuration (PAC) Script

Actually this post could have been divided into several topics, but currently time does not permit me to write that often. So this will be the whole shabang at once.

We needed to import data from an external source, using Java Script Object Notation [JSON], through Microsoft SQL Server Integration Services [SSIS]. Using a Script Task to set up the connection , retrieve the HttpWebRequest and passing the data along to SQL Server was, in broad terms, the way to solve the task. Now, along the way a Proxy Auto Configuration [PAC] Script was introduced, which complicated things a bit.

A PAC is a script that let’s for example Internet Explorer, or any other agent, to choose the appropriate proxy to use, for the current context. An example of a PAC script could look like this:

function FindProxyForURL(url, host) {
 // our local URLs from the domains below example.com don't need a proxy:
 if (shExpMatch(host, "*.example.com"))
 {
 return "DIRECT";
 }

 // URLs within this network are accessed through
 // port 8080 on fastproxy.example.com:
 if (isInNet(host, "10.0.0.0", "255.255.248.0"))
 {
 return "PROXY fastproxy.example.com:8080";
 }

 // All other requests go through port 8080 of proxy.example.com.
 // should that fail to respond, go directly to the WWW:
 return "PROXY proxy.example.com:8080; DIRECT";
}

The quick reader, will already have spotted, that such a PAC Script can indeed return several proxy URLs, separated by semicolon ‘;’. Ours did, in our case. I was ignorantly expecting a single URL to be returned, and spent some ½-hour trying to fix things here and there, without actually getting to the cause. Thanks MessageBox.Show(…) 🙂

So, how to integrate this into the Script Task of SSIS?

A piece of code on Code-Project managed to get me up and running in a jiff, so please have a look at: http://www.codeproject.com/Articles/12168/Using-PAC-files-proxy

The article explains how to set a proxy using PAC files. The solution presented in the article uses the WinHttp.dll for obtaining a proxy URL.
When this code is embedded into the Script Task, it’s a stroll in the park to get your data flowing.

HttpWebRequest request = null;
HttpWebResponse response = null;

string listOfServers = GetProxyForUrlUsingPac(Variables.SourceURL, "http://www.yourdomain.com/pac/pacscriptname.pac");
WebProxy p = new WebProxy("http://" + listOfServers.Split(new char[] { ';' })[0]);
request = (HttpWebRequest)WebRequest.Create(Variables.SourceURL);
request.Proxy = p;
response = (HttpWebResponse)request.GetResponse();

Stream s = response.GetResponseStream();
StreamReader sr = new StreamReader(s);
string json = sr.ReadToEnd();
//DO Whatever you want with the JSON stream

Now, this has every mark of a quick ‘n dirty, so you might want to make it a lot more robust. The list of proxy servers I receive could be put to a better use, rather tahn just picking the first one… But I hope this helps shed some light onto how to connect SSIS Script Task through a proxy, using a PAC Script.

Loading

Tool – Data Splitter

Today I had to find just that one record that was causing a bulk insert to fail.The file that was being loaded, was 1.2GB in size or approx 7M rows.
In order to handle that kind of file, I decided to break it up into smaller pieces.
A sound divide and conquer plan was devised, at every step, dividing the file with the bad record(s) in two, until the files were of a such size, that it would be feasible to go through them by hand.

To accomplish this, I wrote a little tool in C#, that takes a file, which needs to be row-delimited by carriage return ‘/r/n’. A setting allows the user to control how many parts the files is be split into. And that is pretty much it. The files are named the same as the original file, only a number is appended, to tell them apart.
The difference to existing tools is, that this tool splits the file, not in the middle of a record, but with a nice clean cut between rows.

Download the VS2010 project here: DataSplitter
There is no fancy features involved, no multi threading or any of that C# glamour stuff. So don’t get your hopes up high in regards to flashy performance.
If you think the tool is missing key functionality, please let me know.

Loading