Today I had to find just that one record that was causing a bulk insert to fail.The file that was being loaded, was 1.2GB in size or approx 7M rows.
In order to handle that kind of file, I decided to break it up into smaller pieces.
A sound divide and conquer plan was devised, at every step, dividing the file with the bad record(s) in two, until the files were of a such size, that it would be feasible to go through them by hand.
To accomplish this, I wrote a little tool in C#, that takes a file, which needs to be row-delimited by carriage return ‘/r/n’. A setting allows the user to control how many parts the files is be split into. And that is pretty much it. The files are named the same as the original file, only a number is appended, to tell them apart.
The difference to existing tools is, that this tool splits the file, not in the middle of a record, but with a nice clean cut between rows.
Download the VS2010 project here: DataSplitter
There is no fancy features involved, no multi threading or any of that C# glamour stuff. So don’t get your hopes up high in regards to flashy performance.
If you think the tool is missing key functionality, please let me know.