I personally feel that the entire stack of people responsible for this screw up:

The people that gave the OK to Hitachi. FIRST!

The people in charge of backups (tested backups).

The people that planned this event without properly recognizing the risks and there fore having the contingency plan all setup.

The SAN Operators that allowed this to proceed with out proper SNAPSHOT saved off to another SAN, unless they were (provably) strong-armed.

Last but not least the CTO and IT Managers, possibly even the CEO as the ultimate responsibility is his.



And Yes, Computers are KNOWN to be that un-reliable. Google doesn't even fix broken machines in data centers. They have a policy that *IF* a machine stops responding, a reboot request to the "console setup" they have. If the machine comes back, it is automatically re-imaged and made "better". If it fails to respond, it is shutdown and left to decay in place until the "rack" itself is removed.

They figure its going to cost them a minimum of $600 to send a warm body out there, find the machine, reboot it and (possibly) fix it. When the cost of a new machine for them is so low... its not even worth the time and effort to deal with failures any other way than to ignore them.