For the first one, I'm thinking of just reading everything into memory and lazily writing changes. The lazy writer would probably be a second process to allow easy partitioning.
The second bit adds the wrinkles.
A few things I've kicked around are:
1) separate processes for each instance, similar to the architecture above. The small, multiple instances throw a wrench into that setup, though. Perhaps multiple processes, each handling some subset of the instances, determined somewhat dynamically by load. There could be a lot of the smaller instances, and some of them could be as large as one of the main instances.
2) separate processes on the web tier servicing all instances, calling into a central shared cache by instance. I run into the problem of locking changes on the cache, however.
3) punt and just chuck it into a database. Worry about scaling later. ;-)
Another worry is the shared user information.