Divide the work into many more chunks than you have CPUs. Each CPU starts with a list. As each CPU finishes, have it ask for another chunk as long as there is one.

That way rather than having to estimate who will do how much work, you can rely on feedback from how fast it is going to keep everyone happy.

Incidentally most "copy operations" tend to be I/O bound, not CPU bound. Benchmark it. But I suspect that you'll find that the optimal number of jobs to run at once is determined by latency to disk, not number of CPUs. Also you really want to find ways to make sure that disk is sequentially accessed as much as possible. Seek time tends to pathetic, say about 0.01 seconds per seek. In a single read you can read a large chunk of data. That goes into a cache. Further requests for data in that chunk do not trigger another seek.

The back-of-the-envelope calculations that I did at [link|http://www.perlmonks.org/?node_id=381848|http://www.perlmonks.org/?node_id=381848] give an idea how big a deal minimizing the number of seeks can be.

Cheers,
Ben