Re: stoppind md from kicking out "bad' drives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, 11 Nov 2013, Michael Tokarev wrote:

No, really, that's not the solutions I was asking for.

Well, it is.

Yes raid6 is better in this context.  But it has exactly the same properties
when drives start "semi-failing" - it is enough to have one bad sector in
different places of 3 drives for a catastrophic failure, while the array
can even continue to work normally because the bad sectors are in different
places.

If you have timeouts set properly then md will be able to re-calculate the bad sector from parity and re-write it, even with one drive failed.

It is the drive kick-off - the decision made by md driver - which makes the failure catastrophic.

That's what the timeout problem is. If you're running consumer drives and default linux kernel timeouts then the drive will be kicked before it can return a read error.

We may reduce probability of such event by using different configuration tweaks, but the underlying problem remains.

The underlying problem is that you have drives that take longer to return errors compared to the settings you have to wait for results from the drive.

Nope, because the array were (re)syncing a hot spare, not the first failed
drive.

I don't understand why you would be running a RAID5+spare instead of RAID6 without spare.

--
Mikael Abrahamsson    email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux