Inspired by: Steve Novoselac’s Post on SqlBulkCopy
I had a lot of data that I wanted to dump from a wide range of flat text file databases. I already had classes that stored the schema for the tables and some of the formats were weird (i.e. custom date stored as a single), and rather than setting up a bulk insert in SQL Server, I decided I’d just import directly using my classes. They’re optimized to load, cache, and interpret the data but I just needed a way to get it quickly over to the SQL table.
If you have a similar situation or your program is generating a lot of data (over a couple thousand rows) that needs to be dumped into a table, SqlClient.SqlBulkCopy is the way to go.
Slower Methods
Before I used bulk insert, I was using a SqlCommand with parameters (e.g. “INSERT INTO MyTable VALUES (@KeyValue, @Value1, @Value2)“). From what I understand, SQL Server compiles these and so long as the command text doesn’t change (not even character casing), it doesn’t have to compile it again between calls.
While this method was definitely faster than composing an INSERT for each row, I was averaging only a couple or few hundred records a second and I had hundreds of thousands of records.
I got a little performance boost when I waited until the end to add my primary key index, but of course that takes some time depending on the number of rows you have.
With using SqlBulkCopy, my performance jumped tenfold to a couple or few thousand a second. At first I tried caching all the data in a DataTable (about 630,000 records), and the bulk insert at the end took about 25 to 35 seconds. However, the DataTable just ate through memory, taking over a GB when everything was loaded. It would also pause frequently near the end when allocating more memory and causing use of the swap file.
Here’s the code for after the DataTable (dt) has been filled. Also, c represents an open connection, but you can also use a connection string.
Dim bc As SqlBulkCopy
bc = New SqlBulkCopy(c, SqlBulkCopyOptions.TableLock, Nothing)
bc.DestinationTableName = String.Format("[{0}].[dbo].[{1}]", database, tableName)
bc.BatchSize = 1000
bc.BulkCopyTimeout = 300
bc.WriteToServer(dt)
bc.Close()
- BatchSize - Can be the full number of rows, but it seems like performance is a little better when you give the server a chance to process the records as they come in.
- BulkCopyTimeout - The default is 30 seconds. If you have a lot of records you’ll blow past this so make sure to set it higher.
dt.Dispose(): GC.Collect() – You should consider calling this once you’re done to free up some memory from your DataTable.
Fastest
My final, fastest solution was to create a class that implemented the IDataReader interface and that consumed my data. It would load my data from disk as requested by SqlBulkCopy which was then sent to the server. Memory usage was minimal, maybe from 50 to 70MB according to Task Manager and performance was even better since there was no memory allocation hiccups.
Dim bc As SqlBulkCopy
bc = New SqlBulkCopy(c, SqlBulkCopyOptions.TableLock, Nothing)
bc.DestinationTableName = String.Format("[{0}].[dbo].[{1}]", database, tableName)
bc.BatchSize = 1000
bc.BulkCopyTimeout = 3000
bc.WriteToServer(New RecBaseDataReader(rec, recsToExport))
bc.Close()
- RecBaseDataReader is my custom class. I pass it
rec which is used to load data and recsToExport which is a list of records that it can iterate through.
- BulkCopyTimeout - Note that it’s much higher than when we pass in a DataTable because of the extra time that we take loading the data in the IDataReader.
You can set up your IDataReader class any way that works best for you. The following routines are used by SqlBulkCopy and must be implemented. You can just put a Stop in the rest of the routines just in case anyone tries to use them.
- System.Data.IDataRecord.FieldCount – Should return the number of fields in your record.
- System.Data.IDataRecord.GetOrdinal – Should return the ordinal for the given field.
- System.Data.IDataReader.Read – Should advance the read cursor to the next record and load it. Make sure it’s set up so the first call of Read will return the first record.
- System.Data.IDataRecord.GetValue – Should return the value of the column specified by the given ordinal.
Here’s a skeleton implementation: MyCustomDataReader.zip
Notes
- Destination table name is kind of finicky. When I used just the table name, it threw an error so I just went ahead and used the full table name.
- If you have computed columns on your source or destination, you’ll need to set up column mappings to specify which columns should actually be inserted. A straightforward
.ColumnMappings.Add(col.ColumnName, col.ColumnName) worked for me. Others have suggested that you can use an intermediate table which provides a little protection if the bulk insert fails for some reason.
- For the method where I loaded all the data into a DataTable, I set up columns and column types, but didn’t set up MaxLength for varchar columns. This caused an error so make sure these are set up correctly. If you’re using a strongly-typed DataTable generated from the destination table you shouldn’t have to worry about this.
- Check out SqlBulkCopyOptions (specified in the constructor) for more settings on things like how identity columns are handled, how null values are copied to the new table, etc.
- Apparently column names are case-sensitive in SqlBulkCopy. I didn’t encounter any issues with this because I was creating the table at the same time I was setting up column mappings so the casing matched anyways.
- Some are saying there’s a bug in .NET 2.0 where a quote character in the column name will prevent you from using SqlBulkCopy so heads up.
2011-07-27 Edit: Uploaded skeleton class as a zip file to avoid 404 error because of “.vb” extension.