Re: Question about raid robustness when disk fails

Asdo <asdo@xxxxxxxxxxxxx> · Wed, 27 Jan 2010 11:43:56 +0100

Goswin von Brederlow wrote:
Is it possible to cancel a SATA/SCSI command that is being executed by
the drive?
(it's probably feasible only with NCQ disabled anyway, but it's easy
to disable NCQ)

Do you want to do that? I would rather have the drive keep trying and
return an error if it can't read so the raid layer rewrites the blocks
causing it to be remapped. I do not want to wait for that but I want it
to happen.

So you want that to happen in the background?
Not that much benefit for that to happen in the background, imho.
Why not just having an error returned after a timeout, and normal MD 
read-error-recovery procedure kicking in? (recomputation from parity and 
rewrite of the damaged block)

It's a pity we have to rely on TLER, this narrows the choice of drives
a lot...

I don't. I just acknowledge the limitation and accept the downtime 
The time might be so long that MD or the controller can drop the entire 
drive.
It didn't happen to me but I think I read something like this on this ML...
to
find and remove a broken but not properly failed disk. I use raid so I
don't loose my data when a disk fails, not primarily for availability.
So far I had one case in 10 years where a failing disk took down my
system.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html