Re: Raid5 drive fail during grow and no backup

Phil Turmel <philip@xxxxxxxxxx> · Thu, 04 Dec 2014 15:02:51 -0500

Hi Phillip,

On 12/04/2014 02:29 PM, Phillip Susi wrote:
> On 11/7/2014 10:36 PM, Phil Turmel wrote:
>> However, if the device with the bad sector is trying to recover
>> longer than the linux low level driver's timeout, bad things^TM
>> happen. Specifically, the driver resets the SATA (or SCSI)
>> connection and attempts to reconnect.  During this brief time, it
>> will not accept further I/O, so the write back of the reconstructed
>> data fails.  Then the device has experienced a *write* error, so MD
>> fails the drive.  This is the out-of-the-box behavior of
>> consumer-grade drives in raid arrays.
> 
> What?  During the recovery action ( reset and retry ), a write being
> issued to the drive should just sit in the request queue until after
> the drive finishes being reset; it should not just be failed outright.

It's been a few years since I've directly tested this myself, but that's
what would happen.  The window to reject the write might be small, but
it's there (unless the fix is recent).

I'm not an expert on the driver stack, though.  YMMV.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html