Bulk Insert Using SqlClient.SqlBulkCopy

Inspired by: Steve Novoselac’s Post on SqlBulkCopy

I had a lot of data that I wanted to dump from a wide range of flat text file databases. I already had classes that stored the schema for the tables and some of the formats were weird (i.e. custom date stored as a single), and rather than setting up a bulk insert in SQL Server, I decided I’d just import directly using my classes. They’re optimized to load, cache, and interpret the data but I just needed a way to get it quickly over to the SQL table.

If you have a similar situation or your program is generating a lot of data (over a couple thousand rows) that needs to be dumped into a table, SqlClient.SqlBulkCopy is the way to go.

Slower Methods

Before I used bulk insert, I was using a SqlCommand with parameters (e.g. [sql]INSERT INTO MyTable VALUES (@KeyValue, @Value1, @Value2)[/sql]). From what I understand, SQL Server compiles these and so long as the command text doesn’t change (not even character casing), it doesn’t have to compile it again between calls.

While this method was definitely faster than composing an INSERT for each row, I was averaging only a couple or few hundred records a second and I had hundreds of thousands of records.

I got a little performance boost when I waited until the end to add my primary key index, but of course that takes some time depending on the number of rows you have.

With using SqlBulkCopy, my performance jumped tenfold to a couple or few thousand a second. At first I tried caching all the data in a DataTable (about 630,000 records), and the bulk insert at the end took about 25 to 35 seconds. However, the DataTable just ate through memory, taking over a GB when everything was loaded. It would also pause frequently near the end when allocating more memory and causing use of the swap file.

Here’s the code for after the DataTable ([generic]dt[/generic]) has been filled. Also, [generic]c[/generic] represents an open connection, but you can also use a connection string.

[generic]
Dim bc As SqlBulkCopy
bc = New SqlBulkCopy(c, SqlBulkCopyOptions.TableLock, Nothing)
bc.DestinationTableName = String.Format(“[{0}].[dbo].[{1}]”, database, tableName)
bc.BatchSize = 1000
bc.BulkCopyTimeout = 300
bc.WriteToServer(dt)
bc.Close()
[/generic]

  • BatchSize – Can be the full number of rows, but it seems like performance is a little better when you give the server a chance to process the records as they come in.
  • BulkCopyTimeout – The default is 30 seconds. If you have a lot of records you’ll blow past this so make sure to set it higher.
  • [csharp]dt.Dispose(): GC.Collect()[/csharp] – You should consider calling this once you’re done to free up some memory from your DataTable.

Fastest

My final, fastest solution was to create a class that implemented the IDataReader interface and that consumed my data. It would load my data from disk as requested by SqlBulkCopy which was then sent to the server. Memory usage was minimal, maybe from 50 to 70MB according to Task Manager and performance was even better since there was no memory allocation hiccups.

[generic]
Dim bc As SqlBulkCopy
bc = New SqlBulkCopy(c, SqlBulkCopyOptions.TableLock, Nothing)
bc.DestinationTableName = String.Format("[{0}].[dbo].[{1}]", database, tableName)
bc.BatchSize = 1000
bc.BulkCopyTimeout = 3000
bc.WriteToServer(New RecBaseDataReader(rec, recsToExport))
bc.Close()
[/generic]

  • RecBaseDataReader is my custom class. I pass it [generic]rec[/generic] which is used to load data and [generic]recsToExport[/generic] which is a list of records that it can iterate through.
  • BulkCopyTimeout – Note that it’s much higher than when we pass in a DataTable because of the extra time that we take loading the data in the IDataReader.

You can set up your IDataReader class any way that works best for you. The following routines are used by SqlBulkCopy and must be implemented. You can just put a [generic]Stop[/generic] in the rest of the routines just in case anyone tries to use them.

  • System.Data.IDataRecord.FieldCount – Should return the number of fields in your record.
  • System.Data.IDataRecord.GetOrdinal – Should return the ordinal for the given field.
  • System.Data.IDataReader.Read – Should advance the read cursor to the next record and load it. Make sure it’s set up so the first call of Read will return the first record.
  • System.Data.IDataRecord.GetValue – Should return the value of the column specified by the given ordinal.

Here’s a skeleton implementation: MyCustomDataReader.zip

Notes

  • Destination table name is kind of finicky. When I used just the table name, it threw an error so I just went ahead and used the full table name.
  • If you have computed columns on your source or destination, you’ll need to set up column mappings to specify which columns should actually be inserted. A straightforward [csharp].ColumnMappings.Add(col.ColumnName, col.ColumnName)[/csharp] worked for me. Others have suggested that you can use an intermediate table which provides a little protection if the bulk insert fails for some reason.
  • For the method where I loaded all the data into a DataTable, I set up columns and column types, but didn’t set up MaxLength for varchar columns. This caused an error so make sure these are set up correctly. If you’re using a strongly-typed DataTable generated from the destination table you shouldn’t have to worry about this.
  • Check out SqlBulkCopyOptions (specified in the constructor) for more settings on things like how identity columns are handled, how null values are copied to the new table, etc.
  • Apparently column names are case-sensitive in SqlBulkCopy. I didn’t encounter any issues with this because I was creating the table at the same time I was setting up column mappings so the casing matched anyways.
  • Some are saying there’s a bug in .NET 2.0 where a quote character in the column name will prevent you from using SqlBulkCopy so heads up.

2011-07-27 Edit: Uploaded skeleton class as a zip file to avoid 404 error because of “.vb” extension.

10 thoughts on “Bulk Insert Using SqlClient.SqlBulkCopy

  1. Hello,
    I am very interested by your article.
    Vevertheless, there is something i do not understand.
    My need is to copy table from a server to another by using SqlBulkCopy. Until today, everything was running fine but now I have a performance issue, the biggest table can’t be copied (500k to 10M rows) : it last forever (tablelock on, batchsize with different values, timeout huge).

    That’s why I am here 🙂

    I took a look to the Dispose() method, I didn’t setup such a thing to free memory, this is maybe my point, I will do it tonight.

    My question is for your final method : the fastest one :
    bc.WriteToServer(New RecBaseDataReader(rec, recsToExport))
    Did you create a class named RecBaseDataReader by yourself to manage a better way to copy rows ?
    Can you explain how it works ?
    My script take 2hours to copy 10M rows so if you have a faster way to do that, I will love it and it will probably solve my problem.

    Thanks and Regards.

  2. I did create the RecBaseDataReader to implement the IDataReader interface. I updated the post with a sample skeleton for an implementation that you might find handy: MyCustomDataReader.vb.

    By the way, if you’re just doing a simple copy between two servers, it sounds like you might want to look into OPENROWSET.

  3. Sorry, thought I double-checked it. I had uploaded it, but the web server doesn’t allow “.vb” extension. I reuploaded as zip and all looks well.

  4. Thank you so much for this well done and concise example. The MyCustomDataReader skeleton is golden! Now I don’t have to load gigs of data into memory before using SqlBulkCopy to insert quickly.

  5. Hi,
    I’m trying to copy data from datatable to database table and the table has primary key column. I have tried
    Using sqlBulkCopy As New SqlBulkCopy(conString, SqlBulkCopyOptions.KeepIdentity)
    but it’s not working. kindly suggest me the other solution.
    thanks,
    prasad

  6. @Prasad, I’ve never used it with the KeepIdentity option, but some googling produced a lot of good results. You may have an bad column mapping or maybe you have another problem. Try searching for “SqlBulkCopyOptions.KeepIdentity” and the specific exception you’re getting.

  7. Guys, I am using sql bulk copy to insert from a data table. Whether Sql bulk copy will maintain the row order as in datatable ?

  8. Hi Nathan, nice post! I was wondering if instead of using the SqlBulkCopy class, you tried doing something like generating an SQL script using your program, and then running that script using the way mentioned here

  9. The reason this interests me is performance. I want to know which method would be faster. I’ve to insert several text files, each containing from 100k up to 1.5 million rows and ranging between 20MB and 1.2GB in size.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.