Tejun Heo wrote:
So, no, libata won't drop a drive unless it fails to respond to recovery sequence. libata just doesn't have enough information about how devices are used to determine whether a device is failing too often to be useful.
Sure it does. It can determine the number of consecutive failures on the same drive/channel, and it can also count intervening successes, if any.
From that, at a minimum, it could notice that the same drive has gone 'round
the error treadmill (say) 20 times in a row, with no other I/O possible on it because it has yet to successfully complete the reset+reinit phase. Such a drive is a candidate for pushing the error upstairs, and possibly for getting offlined. Fancier fault-handling is also possible, but the bare minimum is that we must not get stuck forever looping in the EH code. Eventually a failed status has to be returned to the layers above, I think. Cheers -- Mark Lord Real-Time Remedies Inc. mlord@xxxxxxxxx - To unsubscribe from this list: send the line "unsubscribe linux-ide" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html