[link|http://www.computerworld.com/hardwaretopics/hardware/server/story/0,10801,89037,00.html?f=x76|http://www.computerw...037,00.html?f=x76]
Low latency data movement faster than any "regular" CPU can read it right now.
I forsee a mixture of of faked SMP and NUMA based on Infiniband. It'll give the single system image for ease of programming. Clusters will pick up on the next step.
For small data, high CPU partitioned compute tasks, Grids are the most cost-effective.
But corporate programmers are lazy. They take a single system model, throw a few CPUs at it, and it seems to work. They don't have the budget or the expertise to test real scaling. They release it, it become business critical, and the performance tanks. Right now the only easy fix is SMP.
I think we will hit a price sweet spot where 4-8 CPU boards are cheap and the next step becomes prohibitive compared to clustering. Mix in infiniband connections and you have nice building block scalability.