Re: Help understanding the root cause of a member dropping out of a RAID 1 set.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 14/08/2009 14:09, Paweł Brodacki wrote:
2009/8/13 John Robinson <john.robinson@xxxxxxxxxxxxxxxx>:

Can or could md be made or configured to try re-adding a device if this
sort of thing happens? After all, a stray cosmic ray or whatever perhaps
shouldn't make one lose redundancy if the drive's actually OK?

I think that from the coding point of view md probably could. The more
important thing is if it should. The only hard fact is that there was
an error while accessing the device. md has no way of telling if it
was just a freak accident, or the drive is unreliable from now on.

Ah well, perhaps we need to give md a way of knowing the difference between a transient error (that has been recovered from) and a more serious error.

Therefore it does the one safe thing and says "I won't trust you
anymore.". If a human being knows better, the said being is free to
re-add the drive.

Personally I'd hate having a suspicious drive being auto-added in hope
it will rebuild and function properly.

I wouldn't want it to be the default behaviour, but I'd like the option to configure things that way. I'd want the number of auto-re-adds configurable too.

Because such an option could seem tempting but could and would cause
loss of reliability I'd expect bad publicity if it was actually added.

But it could cause improvements in reliability too. If the cable on drive A is hit by cosmic rays, the drive is taken out of the array, but the drive's actually still fine, then drive B fails before the operator has re-added drive A, the array goes down when it didn't need to.

What is the operator's most likely response to seeing the SATA bus reset? She's going to re-add the drive assuming it was a transient error. If we could make this happen automatically, we could close a window when the array's more vulnerable. I wouldn't suggest we do it silently; it gets logged, notified etc. just like the drive being taken out of the array would be.

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux