Re: stoppind md from kicking out "bad' drives

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



11.11.2013 11:41, Mikael Abrahamsson wrote:
On Mon, 11 Nov 2013, Michael Tokarev wrote:

The question is: what's missing currently to prevent kicking drives from md arrays at all?  And I really mean preventing _both_ first failed drive (before start of resync) and second failed drive?

Crank up the timeout settings a lot might help (I use 180 seconds), it would probably have stopped the first drive from being kicked out.

But you really should be running RAID6 and not RAID5 (as you now have observed) to handle the failure case you just observed.

No, really, that's not the solutions I was asking for.

Yes raid6 is better in this context.  But it has exactly the same properties
when drives start "semi-failing" - it is enough to have one bad sector in
different places of 3 drives for a catastrophic failure, while the array
can even continue to work normally because the bad sectors are in different
places.

It is the drive kick-off - the decision made by md driver - which makes the
failure catastrophic.

We may reduce probability of such event by using different configuration
tweaks, but the underlying problem remains.

Write-intent bitmap would have stopped the initial full resync of the drive that was kicked out, which might have helped as well.

Nope, because the array were (re)syncing a hot spare, not the first failed
drive.

I asked about write-intent bitmap because it can act as a semi-permanent "list
of bad blocks on component devices" -- instead of kicking whole device out,
mark just the "bad place" on it in the bitmap (the place where we weren't
able to write _new_ data) and continue using it, just avoiding reading from
the marked-as-bad places (because even if it'll succees, the data will be
wrong already).

Thanks,

/mjt

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux