On 14/08/2009 14:09, Paweł Brodacki wrote:
2009/8/13 John Robinson <john.robinson@xxxxxxxxxxxxxxxx>:
Can or could md be made or configured to try re-adding a device if this
sort of thing happens? After all, a stray cosmic ray or whatever perhaps
shouldn't make one lose redundancy if the drive's actually OK?
I think that from the coding point of view md probably could. The more
important thing is if it should. The only hard fact is that there was
an error while accessing the device. md has no way of telling if it
was just a freak accident, or the drive is unreliable from now on.
Ah well, perhaps we need to give md a way of knowing the difference
between a transient error (that has been recovered from) and a more
serious error.
Therefore it does the one safe thing and says "I won't trust you
anymore.". If a human being knows better, the said being is free to
re-add the drive.
Personally I'd hate having a suspicious drive being auto-added in hope
it will rebuild and function properly.
I wouldn't want it to be the default behaviour, but I'd like the option
to configure things that way. I'd want the number of auto-re-adds
configurable too.
Because such an option could seem tempting but could and would cause
loss of reliability I'd expect bad publicity if it was actually added.
But it could cause improvements in reliability too. If the cable on
drive A is hit by cosmic rays, the drive is taken out of the array, but
the drive's actually still fine, then drive B fails before the operator
has re-added drive A, the array goes down when it didn't need to.
What is the operator's most likely response to seeing the SATA bus
reset? She's going to re-add the drive assuming it was a transient
error. If we could make this happen automatically, we could close a
window when the array's more vulnerable. I wouldn't suggest we do it
silently; it gets logged, notified etc. just like the drive being taken
out of the array would be.
Cheers,
John.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html