IWETHEY v. 0.3.0 | TODO
1,095 registered users | 0 active users | 0 LpH | Statistics
Login | Create New User
IWETHEY Banner

Welcome to IWETHEY!

New OK, I'm going to chime in.
Having dealt with RAID arrays for more than 20 years (mostly SCSI, but recently some SATA) I'll say this. They are cranky as hell. There is no way I'd try any modification except a totally fresh install with all identical drives (and I maintain spare drives for my better clients (the ones that pay their bills)). I just plain wouldn't try anything else. RAID arrays are trouble enough under the very best circumstances.

Now be aware my experience is not in "enterprise" systems, but in critical small business environments (medical and wholesale distribution), running mostly Unix and Linux, but also some Windows Servers.

RAID1 (mirror) is what I've used in most cases, because my client's needs fit well within the capacity of drives available, but software firms I work with sometimes send me RAID5 configurations (hopefully with a hot spare) to set up.

Those are the only two RAID configurations I'd want to deal with.

The big problem is RAID controllers. They fail about as often as the hard disks - and the one you have installed is obsolete and no longer sold, but hopefully you can find one on eBay, because otherwise YOUR DATA IS LOST.

I tell folks with RAID controllers they're insane to not buy a backup spare (and I've been told as much by at lest one RAID controller manufacturer) - but of course sitting 5 years on the shelf the chances of your spare actually working are only about 80%.

Now SATA RAID is a bit different because it's "software RAID" even if in the machine's BIOS (but then if you want to be really strict in your logic, so is the RAID provided by the BIOS of a $1,000 SCSI RAID controller) but the purveyors of SATA RAID seem much less serious about their job.

A nice feature of SATA RAID (whether in the MB BIOS or provided by Linux) is that the "dumb as a box of rocks" client is never bothered by a screaming RAID controller - so he doesn't even know he's got a failed array until he has enough drive failures to bring down the whole machine - permanently.

And, by the way, the client SHOULD be "dumb as box of rocks" as far as this stuff is concerned - s/he's supposed to be smart about something else entirely. Unfortunately most are too cheap to have someone constantly monitoring this stuff and planning for the inevitable failure.

Then, of course, there's the problem of simultaneous SATA drive failures, which seem to happen much more often than statistically possible.

And, of course, more commonly than a drive failure, a program has muched its own
data - and written it to the array - so RAID is of no help whatever.

So, all in all, I have to say that RAID is not significantly better than regular (verified**) backups and a RAID failure may take a lot longer to recover from than re-entering a day's data (if you are a major insurance company your experience may vary).

Of course I do have one totally paranoid client (and, by far my most wealthy client - and the one who pays my invoices within 3 days (and has never questioned the amount)). He has a nice stable Debian Linux server with hardware RAID1. Every evening, after closing (they close early) the identical backup server does an NFS mount and copies all the files from the main server. Then certain critical files are automagically backed up to a Zip disk (yes, this system has been running that long), then both servers back up to their DAT4 tape drives, and then when that is all safely done, the main server backs up to an off-line service in the mountains of Colorado. In the morning the president of the company checks the lots to make sure there were no failures. In 20 years we have lost narry a byte.

** And - one last shot - an unverified backup is not a backup. Period. End of discussion.

New +5, Informative.
Really, the only sure RAID array is a backed-up-and-verified one.

The only piece of information you missed is that RAID 6 came about -- RAID 5 with two parity stripes -- because there was an anomaly in the statistical failure rates WRT to partition size that exacerbated secondary failures whilst re-bulding from a disk failure. Which RAID 6 only puts off, not solves. Which is why a decent vendor will recommend a complete rebuild-and-restore from the controller up once a disk has died and been replaced...

Wade.

Q:Is it proper to eat cheeseburgers with your fingers?
A:No, the fingers should be eaten separately.
New Can I ask an iggerent question?
Everyone seems to be saying that RAID is fragile, that you can't really rebuild easily, that it may be faster but at the expense of multiple single points of failure ... Why do people use them? Is it just the ability to address multiple physical volumes as a single large one?
--

Drew
New Striping gives faster access, too. More info.
Spreading the read and write access across drives can give substantially higher throughput than that from a single spindle. RAID's a reasonable compromise in many cases, but too many people use it as a substitute for backups. It's not.

RAID5 failures almost guaranteed with multi-TB arrays:
http://www.zdnet.com...rking-in-2009/162

SSD's aren't a panacea, either:
http://www.enterpris...ge-Networking.htm

Google's famous paper on HD reliability:
http://labs.google.c...disk_failures.pdf (13 page .pdf)

Hard drives (and controllers) fail eventually. Gotta have backups.

Cheers,
Scott.
New If the machine still runs . . .
. . though it may be slow, the array will usually rebuild fine when the bad drive is replaced.

Starting the rebuild, though, can be extremely nerve racking. The RAID controller software is often so bad a normal person has no idea if they are about to write an empty disk over their data or write the data to an empty disk. So far I've never gone the wrong way.

If the controller goes bad you may have no data left by time you learn it's the controller.

I remember a big vendor show (remember those? Bags of swag and an excuse why you must be there instead of at work?) a booth by disk recovery firm Rotating Memories. This was back when hard disks were big and clunky - and veeeeeeeeery expensive. They displayed a high capacity drive they were not able to recover - the owner got so frustrated he shot it with a .38 revolver. Of course it turned out the drive had been fine, it was the controller that wasn't working right.
New Sounds like something my father might have done
He's a mechanic, and always hated intermittent problems. He said, "I'd rather the engine block splits in half and falls out, so I can point to it and say, 'That's what I need to fix.'"
--

Drew
     RAID question - (hnick) - (14)
         Re: RAID question - (folkert) - (1)
             Disk space and sizing - (hnick)
         RAID 0 is striping. No fault tolerance. - (Another Scott)
         Ask the controller - (scoenye)
         Adding mirroring; maybe. - (static)
         OK, I'm going to chime in. - (Andrew Grygus) - (5)
             +5, Informative. - (static)
             Can I ask an iggerent question? - (drook) - (3)
                 Striping gives faster access, too. More info. - (Another Scott)
                 If the machine still runs . . . - (Andrew Grygus) - (1)
                     Sounds like something my father might have done - (drook)
         Another question? - (hnick) - (2)
             As far as the OS is concerned, - (Andrew Grygus)
             If it were me, and I did it for a living... - (Another Scott)

Very punk.
51 ms