Like others said, but, that's pure political

I doubt your current contract mandates clean drives. There will be some type of bullshit best effort statement, but'll be meaningless and unenforceable.

It is very unlikely you get any changes now, since the system is already put in, and it will cost a LOT more than the cost of the disks, since now you are pushing the cost of lawsuits at them.

EMC arrays toss perfectly fine drives all the damn time. There are bunches of system configurable parameters that control your tolerance for retries, at the sector, disk, bus, etc, levels. Every time there is a hint of an error, the array has the choice of marking the drive bad and moving on.

It will toss the drive.

Now you can either rebuild to the drive or swap it. Since it is far less likely it is really the drive's fault (transient errors are almost always bus, connectors and cable, and (flip, flip, flip) electromagnetic solar flare due to the (garbled) effect) it is a realistic option to rebuild to the drive.

In my experience, we'd have a couple of possibly failed drives in transition. Since the we didn't know what the 1st failure was really for, we'd put a different drive in to rebuild. If that failed in the next week in the same manner, we'd know it was not a drive problem but an array issue, and start screaming about that.

It might be if they say the array is fine, than I'd suggest you notch up the retries associated with whatever is failing, since it'll keep happening and you are dealing with a design / unfixable environmental flaw at that point. Or you learn to scream louder and get a new drive tray, etc.

So, you might find yourself returning less drives once you've gone through a couple of cycles of real troubleshooting.

At this point, we focus on the drives. If the drives has failed in such a what you simply can't communicate with it, you don't have the ability to treat it. You MIGHT be able get them to sell it to you at cost but I wouldn't bet on it since they are just sending it back to vendor anyway for a replacement. Accept that fact you will eat the cost of a few drives.

In the case of drives that you can communicate with, setup a workstation with whatever board is required to talk to them (by the way, what type of drives?), and do a secure wipe before sending them back.

If we go worst case scenario on costs, assume it is a full time job:

Decent PC, lots of slots, high end 3rd party drive boards: $20K
Full time tech, fully loaded cost: $100K

Of course, this is not a full time job, since the initial setup should take a couple of weeks, and then ongoing drive swap, menu choice to clear should be an entry level clerical job. The key is that you have a low level Linux / Hardware capable BOFH available for then the drive errors require a judgement call.

I figure it would take me 2 days to setup the box, setup a menu interface, and whip up a bunch a scripts that would allow hot adding of the failed drives and wiping them. If you wanted to be brain dead about it, allow a power cycle to probe them out and bring them in on boot.

Hmm, actually, not that I think about it, I could see this being a portable device. One that would be mandated to use in the field. Hell, I'm gonna patent that sucker.

Edited by crazy Aug. 18, 2007, 03:03:52 PM EDT

Like other's said, but, that's pure political

I doubt your current contract mandates clean drives. There will be some type of bullshit best effort statement, but'll be meaningless and unenforceable.

It is very unlikely you get any changes now, since the system is already put in, and it will cost a LOT more than the cost of the disks, since now you are pushing the cost of lawsuits at them.

EMC arrays toss perfectly fine drives all the damn time. There are bunches of system configurable parameters that control your tolerance for retries, at the sector, disk, bus, etc, levels. Every time there is a hint of an error, the array has the choice of marking the drive bad and moving on.

It will toss the drive.

Now you can either rebuild to the drive or swap it. Since it is far less likely it is really the drive's fault (transient errors are almost always bus, connectors and cable, and (flip, flip, flip) electromagnetic solar flare due to the (garbled) effect) it is a realistic option to rebuild to the drive.

In my experience, we'd have a couple of possibly failed drives in transition. Since the we didn't know what the 1st failure was really for, we'd put a different drive in to rebuild. If that failed in the next week in the same manner, we'd know it was not a drive problem but an array issue, and start screaming about that.

It might be if they say the array is fine, than I'd suggest you notch up the retries associated with whatever is failing, since it'll keep happening and you are dealing with a design / unfixable environmental flaw at that point. Or you learn to scream louder and get a new drive tray, etc.

So, you might find yourself returning less drives once you've gone through a couple of cycles of real troubleshooting.

At this point, we focus on the drives. If the drives has failed in such a what you simply can't communicate with it, you don't have the ability to treat it. You MIGHT be able get them to sell it to you at cost but I wouldn't bet on it since they are just sending it back to vendor anyway for a replacement. Accept that fact you will eat the cost of a few drives.

In the case of drives that you can communicate with, setup a workstation with whatever board is required to talk to them (by the way, what type of drives?), and do a secure wipe before sending them back.

If we go worst case scenario on costs, assume it is a full time job:

Decent PC, lots of slots, high end 3rd party drive boards: $20K
Full time tech, fully loaded cost: $100K

Of course, this is not a full time job, since the initial setup should take a couple of weeks, and then ongoing drive swap, menu choice to clear should be an entry level clerical job. The key is that you have a low level Linux / Hardware capable BOFH available for then the drive errors require a judgement call.

I figure it would take me 2 days to setup the box, setup a menu interface, and wip up a bunch a scripts that would allow hot adding of the failed drives and wiping them. If you wanted to be brain dead about it, allow a power cycle to proble them out and bring them in on boot.

Hmm, actually, not that I think about it, I could see this being a portable device. One that would be mandated to use in the field. Hell, I'm gonna patent that sucker.

Edited by crazy Aug. 18, 2007, 03:15:49 PM EDT

Like others said, but, that's pure political

I doubt your current contract mandates clean drives. There will be some type of bullshit best effort statement, but'll be meaningless and unenforceable.

It is very unlikely you get any changes now, since the system is already put in, and it will cost a LOT more than the cost of the disks, since now you are pushing the cost of lawsuits at them.

EMC arrays toss perfectly fine drives all the damn time. There are bunches of system configurable parameters that control your tolerance for retries, at the sector, disk, bus, etc, levels. Every time there is a hint of an error, the array has the choice of marking the drive bad and moving on.

It will toss the drive.

Now you can either rebuild to the drive or swap it. Since it is far less likely it is really the drive's fault (transient errors are almost always bus, connectors and cable, and (flip, flip, flip) electromagnetic solar flare due to the (garbled) effect) it is a realistic option to rebuild to the drive.

In my experience, we'd have a couple of possibly failed drives in transition. Since the we didn't know what the 1st failure was really for, we'd put a different drive in to rebuild. If that failed in the next week in the same manner, we'd know it was not a drive problem but an array issue, and start screaming about that.

It might be if they say the array is fine, than I'd suggest you notch up the retries associated with whatever is failing, since it'll keep happening and you are dealing with a design / unfixable environmental flaw at that point. Or you learn to scream louder and get a new drive tray, etc.

So, you might find yourself returning less drives once you've gone through a couple of cycles of real troubleshooting.

At this point, we focus on the drives. If the drives has failed in such a what you simply can't communicate with it, you don't have the ability to treat it. You MIGHT be able get them to sell it to you at cost but I wouldn't bet on it since they are just sending it back to vendor anyway for a replacement. Accept that fact you will eat the cost of a few drives.

In the case of drives that you can communicate with, setup a workstation with whatever board is required to talk to them (by the way, what type of drives?), and do a secure wipe before sending them back.

If we go worst case scenario on costs, assume it is a full time job:

Decent PC, lots of slots, high end 3rd party drive boards: $20K
Full time tech, fully loaded cost: $100K

Of course, this is not a full time job, since the initial setup should take a couple of weeks, and then ongoing drive swap, menu choice to clear should be an entry level clerical job. The key is that you have a low level Linux / Hardware capable BOFH available for then the drive errors require a judgement call.

I figure it would take me 2 days to setup the box, setup a menu interface, and wip up a bunch a scripts that would allow hot adding of the failed drives and wiping them. If you wanted to be brain dead about it, allow a power cycle to proble them out and bring them in on boot.

Hmm, actually, not that I think about it, I could see this being a portable device. One that would be mandated to use in the field. Hell, I'm gonna patent that sucker.

Collapse All History

Welcome to IWETHEY!