Tuesday, March 27, 2012

Foreach loop with parallel execution

Is is possible to get the iterations in a foreach loop to run in parallel? What I need to do is to spawn an arbitrary number of parallel execution paths that all look exactly the same. The number is equal to the number of input files, which varies from time to time. Any help is appreciated!

Regards,
Lars R?nnb?ck

Nope. This feature was available in some beta releases, so you may notice it references in newsgroups and forums. But it was cut due to complexity and quality issues.

|||Thanks for the answer, albeit not what I was hoping for. Could it be "simluated" by having a loop containing only an Execute Package Task and the ExecuteOutOfProcess flag set to true?

Regards,
Lars|||ExecuteOutOfProcess does not change synchronous behavior of Execute Package Task - the task still waits for the package to finish.

If you want to start child packages really asynchronously - i.e. start child and continue execution of parent package, use Execute Process Task to start dtexec, specify
application="cmd.exe" and
parameters="/c start dtexec.exe /f package file ..."

You'll also need to configure child package using DTEXEC's command line (where I've left '...').|||Note that
1) parent can't reliably get execution result from the children, since it may exit before all children finish,
2) in some cases you may get even worse performance compared to sequential execution - since all these packages will clash for processor and memory.|||Thank you very much for your help Michael. I will try the proposed solution and compare performance with running everything sequentially, which we might end up doing then. When support for parallelism was included I suppose that was done in way to minimize clashes for processors and memory, so my final question is if it will reappear in a later version or service pack?

Thanks,
Lars|||3) You don't have any control over the degree of parallelism. If you have 100 files, but only want to process three at a time for example.
|||

lasa wrote:

Thank you very much for your help Michael. I will try the proposed solution and compare performance with running everything sequentially, which we might end up doing then. When support for parallelism was included I suppose that was done in way to minimize clashes for processors and memory, so my final question is if it will reappear in a later version or service pack?

Thanks,
Lars

The smart money says this will appear in a later version. Alot of people are asking for it.

-Jamie|||

Wouldn't it be possible to achieve control over the degree of parallelism using a Dummy package and the /MaxConcurrent flag of dtexec? Say I start four "real" packages in parallel using cmd.exe and use /MaxConcurrent 4 as an option to dtexec, then I start one dummy package using dtexec directly with the option /MaxConcurrent 1. The way I have understood it, the dummy package will now be queued for execution and will start only when the number of parallell processes goes below 1, i e when all four "real" packages are finished?
I am going to try this out and will report back.
Regards,
Lars

|||Since the experiment above didn't work out the way I thought (the dummy package started regardless of the fact that four other packages were running) I am guessing that I have misunderstood the /MaxConcurrent option. Taken from BOL:

Specifies the number of executable files that the package can run concurrently. The value specified must be a non-negative integer, or -1. A value of -1 means that SSIS will allow a maximum number of concurrently running executables that is equal to the total number of processors on the computer executing the package, plus two.

What kind of executable files is the text referring to? Those that are called using the "Execute Process Task" within a package? It made more sense that the SSIS engine would only allow a certain number of concurrently running packages.

Regards,
Lars|||Executables are tasks in a single package. The SSIS runtime nor DTExec do not do any interprocess communication to limit the number of packages or tasks running across process boundaries.

Matt

No comments:

Post a Comment