Ben, my understanding about how caching's supposed to work is more-or-less the same as yours. It turns out that, in practice, caching policies seem to be all over the map -- but are heavily on the "excessive and inappropriate" side of that map. This can lead to some frustrating situations, e.g., you know you're going to be migrating a public Web server, so you set TTL values low well in advance, you perform the migration, and then you start getting mail from people who unaccountably aren't reaching the new site, apparently because of inappropriate caching.
Yes, what you're proposing sounds like a better way, better even than the way the RFCs say it's supposed to work -- but real-world usage appears to only spottily implement even the latter.
Why is mail the only protocol that DNS has seen fit to make failover be implemented for in the DNS system?
Interesting idea. The one almost unique characteristic of mail that I can think of, that might account for this, is that mail is asynchronous, and thus benefits from queueing and redelivery in a way that other services generally don't. But it's a good thought. I can't think offhand of other services that could benefit from similar treatment, but there might be some.
Rick Moen
rick@linuxmafia.com