Re: faulty disk testing

Mark Lord <mlord@xxxxxxxxx> · Tue, 05 Sep 2006 09:48:43 -0400

Tejun Heo wrote:

So, no, libata won't drop a drive unless it fails to respond to recovery 
sequence.  libata just doesn't have enough information about how devices 
are used to determine whether a device is failing too often to be useful.

Sure it does.  It can determine the number of consecutive failures on
the same drive/channel, and it can also count intervening successes, if any.

From that, at a minimum, it could notice that the same drive has gone 'round
the error treadmill (say) 20 times in a row, with no other I/O possible on it
because it has yet to successfully complete the reset+reinit phase.

Such a drive is a candidate for pushing the error upstairs,
and possibly for getting offlined.

Fancier fault-handling is also possible, but the bare minimum is that we
must not get stuck forever looping in the EH code.  Eventually a failed status
has to be returned to the layers above, I think.

Cheers
--
Mark Lord
Real-Time Remedies Inc.
mlord@xxxxxxxxx

-
To unsubscribe from this list: send the line "unsubscribe linux-ide" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html