Sunday, February 19, 2012

For each container - performance problem?

I have a for each container configured to iterate over files in a directory (around 2600 files). The for each container contains only one data flow task. The data flow task is very simple. Only 3 steps. Read from file using flat file source (ragged right), format column values (eg. converting YYYYMMDD into yyyy/mm/dd and TRIM spaces - 13 such conversions) using derived column transform and inserting them into a table (23 columns - mostly nvarchar, few numerics and few dates) using Ole Db Destination. The table has one non-unique clustered index on nvarchar(18) column. Each of the 2600 file has at an average 750000 rows.

It seems to me that memory deallocation doesn't happen after each iteration of the loop (I might be wrong). So after few iterations, sqlservr.exe is hogging all available memory (2GB in this case) whereas my dual processors are running at 3-6 % of capacity. As a result, to process less than 5 million rows, it is taking 6 hours.

I am surprised that for such a trivial operation, performance should degrade this much.

One thing that is suspicious is that the files are residing on a USB 2 external drive.
Also, the database files are on the same drive. But, still... why first few iterations zap through like missiles while the later ones seem to be competing with snails?

Any insight?

TIA,
NiteshDo you use transactions? If yes, I suspect the cause of the problem that all files are submitted as part of single transaction - so the Sql Server has to keep all the data before committing the transactions.

I would rather avoid transactions unless really needed and use SQL Destination for bulk insert operation which performs better than generic OleDb Destination.|||Try loading to a raw file. You can append to the raw file each time around the loop. Then, simply insert the whole of the raw file contents at the same time.

-Jamie|||

Michael Entin SSIS wrote:

Do you use transactions? ...


No, I am not using Transactions.

Michael Entin SSIS wrote:

...use SQL Destination for bulk insert operation which performs better than generic OleDb Destination.


If I remember correctly, in CTP15/16 SQL Destination was not an option for remote operations. I am not sure what holds true for nov RTM. I will give it a shot.

thanks,
Nitesh|||

Jamie Thomson wrote:

Try loading to a raw file. You can append to the raw file each time around the loop. Then, simply insert the whole of the raw file contents at the same time.

-Jamie

Interesting idea Jamie.
However, this will create a huge file (say with 1.5 billion rows). I am not sure when I try to load that, how it will impact memory etc.

thanks,
Nitesh|||

Nitesh Ambastha wrote:

Jamie Thomson wrote:

Try loading to a raw file. You can append to the raw file each time around the loop. Then, simply insert the whole of the raw file contents at the same time.

-Jamie

Interesting idea Jamie.
However, this will create a huge file (say with 1.5 billion rows). I am not sure when I try to load that, how it will impact memory etc.

thanks,
Nitesh

There's only one way to find out :)

I'll be brave and say it should be able to handle it. Remember, all it does is swap buffers in and out of memory - once the contents of a buffer is inserted into the target the buffer "disappears".

You may have to play with buffer sizes etc... to get optimal performance but that sounds like an interesting piece of work if you ask me (perhaps that's just me! Smile)

Let us know how it goes anyway. You should definately try it.

-Jamie|||

Nitesh Ambastha wrote:


If I remember correctly, in CTP15/16 SQL Destination was not an option for remote operations. I am not sure what holds true for nov RTM. I will give it a shot.

thanks,
Nitesh

Correct. SQL Server Destination can only be used when the target is local.

-Jamie

No comments:

Post a Comment