Selecting an arbitrary number of random rows from a table with T-SQL

While searching for a fairly large dataset to use for performance baselining, I was looking at the option of duplicating data in the Adventure Works database to match our needs.
In order to check for weak spots, I need at least 10M fact rows and preferably at least one dimension with approximately 1M members.
One approach would be to generate the cross product of all customers by first-, mid- and last name, this makes app. 11M records. These members could the be matched to an abitrary number of facts.

So, how to get an arbitrary number of fact rows:

  1. BEGIN TRANSACTION
  2. DECLARE @FirstName nvarchar(50), @MiddleName nvarchar(50), @LastName nvarchar(50), @SalesOrderNumber NVARCHAR(2) = ‘SO’
  3. DECLARE @MaritalStatus nchar(1)
  4. DECLARE @Gender nvarchar(1)
  5. DECLARE @RecordCount INT, @GeographyKey INT, @CustomerKey INT, @SONumber INT = 100000
  6. DECLARE customer_cursor CURSOR FAST_FORWARD
  7.     FOR SELECT FirstName, MiddleName, LastName FROM dbo.AllCustomers
  8. OPEN customer_cursor
  9. FETCH NEXT FROM customer_cursor
  10. INTO @FirstName, @MiddleName, @LastName;
  11. WHILE @@FETCH_STATUS = 0
  12.     BEGIN    
  13.         SET @RecordCount = FLOOR( CAST( DATEPART(MILLISECOND, GETDATE() ) AS DECIMAL ) / 4 )
  14.         SET @GeographyKey =( SELECT TOP 1 GeographyKey FROM DimGeography ORDER BY NEWID() )
  15.         SET @Gender =( SELECT TOP 1 Gender FROM DimCustomer WHERE FirstName = @FirstName ORDER BY NEWID() )
  16.         SET @MaritalStatus =( SELECT TOP 1 MaritalStatus FROM DimCustomer ORDER BY NEWID() )
  17.         SET @SONumber = @SONumber + 1
  18.         INSERT INTO dbo.DimCustomer
  19.             (
  20.                 CustomerAlternateKey
  21.                 , GeographyKey
  22.                 , FirstName
  23.                 , MiddleName
  24.                 , LastName
  25.                 , MaritalStatus
  26.                 , Gender            
  27.                 , IsModified
  28.              )
  29.             VALUES
  30.             (
  31.                 CAST( LEFT( newid(), 15 ) AS nvarchar(15) )
  32.                 , @GeographyKey
  33.                 , @FirstName
  34.                 , @MiddleName
  35.                 , @LastName
  36.                 , @MaritalStatus
  37.                 , @Gender
  38.                 , 1
  39.             )
  40.         SELECT @CustomerKey = SCOPE_IDENTITY()
  41.         DECLARE @SQL VARCHAR(MAX) =
  42.         ‘        
  43.         WITH Data AS
  44.         (
  45.             SELECT TOP ‘ + CAST( @RecordCount AS VARCHAR(4) ) + ‘ f.ProductKey, OrderDateKey, f.DueDateKey, f.ShipDateKey, ‘ + CAST( @CustomerKey AS VARCHAR ) + ‘ AS CustomerKey, f.PromotionKey, f.CurrencyKey, f.SalesTerritoryKey,  ”’ + @SalesOrderNumber + CAST( @SONumber AS VARCHAR(10) ) + ”’ As SalesOrderNumber, f.SalesOrderLineNumber, f.RevisionNumber, f.OrderQuantity, f.UnitPrice, f.ExtendedAmount, f.UnitPriceDiscountPct, f.DiscountAmount, f.ProductStandardCost, f.TotalProductCost, f.SalesAmount, f.TaxAmt, f.Freight, f.CarrierTrackingNumber, f.CustomerPONumber FROM dbo.FactInternetSales f ORDER BY NEWID()
  46.         )
  47.         INSERT INTO dbo.FactInternetSales
  48.         SELECT
  49.             ProductKey
  50.             , OrderDateKey
  51.             , DueDateKey
  52.             , ShipDateKey
  53.             , CustomerKey
  54.             , PromotionKey
  55.             , CurrencyKey
  56.             , SalesTerritoryKey
  57.             , SalesOrderNumber
  58.             , ( RANK() OVER ( ORDER BY NEWID() ) ) AS SalesOrderLineNumber
  59.             , RevisionNumber
  60.             , OrderQuantity
  61.             , UnitPrice
  62.             , ExtendedAmount
  63.             , UnitPriceDiscountPct
  64.             , DiscountAmount
  65.             , ProductStandardCost
  66.             , TotalProductCost
  67.             , SalesAmount
  68.             , TaxAmt
  69.             , Freight
  70.             , CarrierTrackingNumber
  71.             , CustomerPONumber
  72.         FROM Data’
  73.         –PRINT (@SQL)
  74.         EXEC(@SQL)
  75.         FETCH NEXT FROM customer_cursor
  76.         INTO @FirstName, @MiddleName, @LastName;
  77.     END
  78. CLOSE customer_cursor;
  79. DEALLOCATE customer_cursor;
  80. COMMIT TRANSACTION

Running this script will generate between 1 and 275 million fact rows, depending on how long you let it run.

Alert:
This script is put together in the least amount of time possible, with absolutely no consideration towards performance.runs incredibly slow, so any hints on how to make this run in less than a week are highly appreciated! 😉

Alert (again)
This script will run for days!

Loading

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.