This type of work is very common in my environment.
What data needs to be shared amongst the processes if any?
Do you truly need to split the data, or can you jump to calculated start points?
Are you truly CPU intensive, or do you bottlenck on IO?
How often does it run?
How much memory does a single process take?
Does the memory requirement scale with the data, or does it hit a limit of how much it needs and then remain static?
How is the core data shared? Do you have a bunch of compute servers sharing an NFS back end or so you sling the data around via interprocess messages or a large shared segment?
How many processes can you kick of at once, ie: Is there an inherent limitation to the amount of data granularity or can you keep sub-dividing?
If NFS back-end, what OS / hardware?
If you move a LOT of data have you closely monitored the system cycles to see where NFS starts to be the bottleneck.
Have you considered bypassing NFS by moving the data using rsh pipes, ie:
"(rsh SERVER dd if=source_data ibs=4096k obs=8k) | dd ibs=8K obs=4096k | local_program". Note: The BS= are dependant on your network and TCP/IP stack. Test for the best mix by doing:
time (rsh SERVER dd if=source ibs=4096K obs=(1-64K) | dd of=/dev/null obs=4096k ibs=(1-64k)
If CPU intensive, are you using the latest and greatest intel or or AMD, or are you putting up with crappy old / slow / Sun gear? A few compute server swapouts can make a HUGE difference when bottlenecking on the CPUs.
If Sun, and G-bit, are you using Jumbo frames? Doubtful unless you get 3rd party Gbit cards. You can triple your throughput and drop your kernel cycles dramatically if you do.
The list goes on and on.
Need mroe info!!!