rofl microsoft

Post #315,517 by boxley 10/12/09 3:56:47 PM Reply	rofl microsoft http://www.pcworld.c...e_smartphone.html his week, Microsoft announced that they had lost all Sidekick user data including t-mobile sidekickpictures, contacts, calendars and other information from the Danger's servers. Since the devices sync with the servers, the devices also lost the data. The Sidekick data services had amazingly been out over a week
Post #315,520 by crazy 10/12/09 4:24:51 PM Reply	Check how the business press reports it http://www.reuters.c...N1213255320091012 Microsoft said in an emailed statement that the recovery process has been "incredibly complex" because it suffered a confluence of errors from a server failure that hurt its main and backup databases supporting Sidekick users. Whiny SOBs, ain't they.
Post #315,542 by scoenye 10/13/09 12:12:18 AM Reply	Blame shifted to Hitachi http://www.channelre...sidekick_hitachi/ It appears that the direct cause of the Sidekick data loss may have been storage area network remedial work outsourced to Hitachi Data Systems. But I don't think they can blame Hitachi for this: There was however, no backed-up or replicated data set... A synced replica could have been vaporized if the damage was replicated to it, but no backup whatsoever??
Post #315,544 by folkert 10/13/09 7:25:59 AM Reply	I was afriad of that.... Hitachi would NOT have gone forward without the "GO AHEAD" from the operators of the SAN... Danger/Microsoft. No matter that problem, Microsoft is the one that screwed up here with NO backups. I am sure though that T-Mobile will get the brunt of the push back and they will be the ones losing the customers. The blame on this SQUARELY lies on the person(s) that were supposedly backing up this data. It is the TASK of the backups people to test a restore from time to time to make sure things are good. This tells me its Microsoft's subsidiary Danger.
Post #315,548 by boxley 10/13/09 9:15:17 AM Reply	not so fast, with SAN connectivity getting smarter backups are done on the SAN with snapshots not offline storage UNLESS there is a DR requirement that requires offline storage. Many folks are resistant to that because of cost and speed of same and dont do it. Getting more common in data centers
Post #315,557 by folkert 10/13/09 11:10:07 AM Reply	Not so fast yourself... Redundancy IS NOT a replacement for backups. RAID IS NOT a replacement for backups. SNAPSHOTS ON THE SAME SAN (and yes I have used them to restore from) ARE NOT a replacement for backups Even though I have all that... I still have a backup solution... which then also replicates to a geo-redundant storage. Tis no excuse to "rely" on a sure thing, that has historically been shown to fail. RAID fails regularly, Redundancy fails regularly, SNAPSHOTS can and have been proven bad regularly, backups have proven to be bad, geo-redundant storage has been proven to be bad in some cases (if the backup source is bad). If you don't test... you'll never know.
Post #315,564 by boxley 10/13/09 11:25:35 AM Reply	YOU do I ADVOCATE that, doesnt mean that it happens
Post #315,574 by folkert 10/13/09 1:29:07 PM Reply	Re: YOU do I ADVOCATE that, doesnt mean that it happens So, that means lost data eventually. Plan for worst case and implement worst scenario recovery plans. Plus, if you don't test regularly... Backups are like Expensive insurance. You hate to spend money on it, hate to devote time to it, hate to have to have. But when all else fails and backups are there and save all of your customer's data... You'll be glad you had it. Sort of like going with "no-fault" only on a 2010 Mercedes Benz 500SEL. If you crash it, you are stuck fixing it yourself and slowly and at great cost. If you had full coverage, you might even get a new 2010 to replace it. Almost as if nothing happened. First time you have catastrophic Data loss, you better make sure you've archived ALL of your advice and be able to prove people off-put it, other wise... its your ass.
Post #315,575 by boxley 10/13/09 1:31:04 PM Reply	Saved in several places
Post #315,623 by Steve Lowe 10/14/09 12:25:23 PM Reply	inexcusable Basic risk analysis would point this potential weakness out. We won't TOUCH a datastore or database server without a well-defined maintenance window with risks and contingencies identified, and the first step after taking the database offline is a backup. The risk of making the backup is infinitesimal compared to what they're facing now. Also, any fool who'd design a service like that without a) at least one georedundant data center, and b) a separate geo-redundant backup system should be hung from the nearest yardarm. Likewise, anyone who'd create this task without a known-good backup as part of a fleshed out contingency plan should join him. I've seen a fair number of articles (mostly written by idiots) claiming this is yet another reason that cloud computing is teh bad. Personally, I don't think you can apply that label here because this service is apparently only slightly better equipped than the high school webgenius with a stack of servers in a back bedroom cooled with a box fan and served from wifi stolen from a neighbor. As a company that sells services based on cloud and grid computing, that kind of reporting is what pisses me off. Yes, we're georedundant on backups and content delivery, and will be on application serving within a year. And no we don't have on the order of 11MM users, but we apparently take access and security of our client data a hell of a lot more seriously than Danger.
Post #315,624 by drook 10/14/09 12:29:24 PM Reply	Little harsh there, don't you think? You make it sound like computer systems are notoriously unreliable, perhaps due to their inherent complexity. Do you really think the recent history of computing bears out such a pessimistic outlook? -- Drew
Post #315,627 by folkert 10/14/09 1:22:54 PM Reply	I think it was right on... I personally feel that the entire stack of people responsible for this screw up: The people that gave the OK to Hitachi. FIRST! The people in charge of backups (tested backups). The people that planned this event without properly recognizing the risks and there fore having the contingency plan all setup. The SAN Operators that allowed this to proceed with out proper SNAPSHOT saved off to another SAN, unless they were (provably) strong-armed. Last but not least the CTO and IT Managers, possibly even the CEO as the ultimate responsibility is his. And Yes, Computers are KNOWN to be that un-reliable. Google doesn't even fix broken machines in data centers. They have a policy that IF a machine stops responding, a reboot request to the "console setup" they have. If the machine comes back, it is automatically re-imaged and made "better". If it fails to respond, it is shutdown and left to decay in place until the "rack" itself is removed. They figure its going to cost them a minimum of $600 to send a warm body out there, find the machine, reboot it and (possibly) fix it. When the cost of a new machine for them is so low... its not even worth the time and effort to deal with failures any other way than to ignore them.
Post #315,628 by Another Scott 10/14/09 1:24:57 PM Reply	Well played, sir. :-)
Post #315,629 by folkert 10/14/09 1:28:19 PM Reply	Oh dude... did he zing me or what? Of course, I live it. So its hard to see the sarcasm when you are so close to it.
Post #315,632 by Andrew Grygus 10/14/09 1:53:11 PM Reply	That should be "'hanged' from the nearest yardarm" . . . . . hung is something else.
Post #315,634 by daemon 10/14/09 2:26:33 PM Reply	rman to a different array than database I insisted on an rman to virt but was overruled. Its gonna happen sooner or later do have georedundancy for a different application in place, those folks understood
Post #315,576 by jbrabeck 10/13/09 1:58:05 PM Reply	Test? You're supposed to test? Few years back, when employer was using tape backup, 3000 tapes per back, I was advocating changing. Told them that even with a 99.995% good rate, there was still going to be bad tapes. Nah never happen. Did a Disaster recovery test. Had to go back 3 or 4 weeks of tapes before we found a complete set... Much better now, still long way from what I'd call reliable.
Post #315,577 by crazy 10/13/09 2:35:32 PM Reply	Yup, saw it coming Point and click vendor techs are the most dangerous thing you can let loose in your server room.
Post #315,598 by scoenye 10/13/09 10:57:25 PM Reply	I have no direct experience with Hitachi... ... but their equipment is at the very high end of the scale. I doubt they would be sending in point and click techs to work on it. (Closest I came was in '05, back in Belgium, working for a city hospital. We put out a public bid for a fully redundant SAN installation. We were drooling over the specs of their entry. Hitachi was then trying to get beyond the mainframe/supercomputer market, so the price was absolute killer. Unfortunately for them (and us), their partner vendor submitted the bid a day late...)
Post #315,606 by boxley 10/14/09 8:30:22 AM Reply	EMC is as good as hitachi and yes they do send in point and click techs
Post #315,621 by Steve Lowe 10/14/09 12:05:50 PM Reply	I have buddy that for for HDS He wasn't in implementation or service, but always spoke highly of their field staff. I got the impression there were some pretty impressively smart people working there.

Welcome to IWETHEY!