Re: detection/correction of corruption with raid6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 2008-12-19 at 09:40 +0100, piergiorgio.sartor@xxxxxxxx wrote:
> Hi,
> 
> thanks for the answer.
> I've still some comments on the topic, see below.
> 
> > Suppose we agree that bit flips don't happen (undetected) on drive
> > media.  But that bit flips can happen elsewhere (memory.  IO Buss
> > etc).
> > 
> > And then suppose we discover that a bit-flip has happened.  What does
> > that tell us?
> > Maybe it tells us that our hardware is dodgey.  So it cannot be
> > trusted to reliably do anything we tell it.  So maybe we shouldn't
> > tell it to do anything. ??
> 
> Maybe I should try to clarify the concept.
> There are *two* use cases.
> One is the "check" and one is the "repair".
> As I already wrote, I do agree that "repair" needs some deeper
> thinking. It is easy to see cases where it could produce more
> damages.
> The "check" case is another story.
> In case of RAID-6 I would like, as RFE, to have in the logs some
> report on which "drive" or "data path" the mismatch occurs, when
> detectable.
> So, if the mismatch count says there are 1024 mismatches, then
> would be nice to know if they belong all to the same drive or not.
> In this case, it would be possible to fail/remove that one and
> check the hardware (change drive/cable/connector/etc.).
> 
> Ideally, at the end of the "check", the log should report how
> many mismatches, how many are "undeterminable" (multiple
> drive), how many could belong to a specific drive.
> This will help to to diagnose a problem, maybe reported by
> the CRC in the filesystem.

Agreed :)

> This is for the "check", about the "repair", the only possible
> change I could see is to offer the user, and we could check
> in this mailing list how many would like to have the possibility,
> the option to "reset the parity" of the array or "recalculate the
> data", with the warning that the second one can do more
> damage than already has.
Yes, there is ofcourse the possibility to do damage, but i think if its
2 vs 1, thats something most people would bet on, atleast if its
multiple occourances all with the same "1".

:)

> 
> Conclusion, for me, is that the "check" should be more
> clever, with RAID-6, and "repair/resync" *might* be more
> flexible (with warnings).


> 
> I take the opportunity to wish you all Merry Christmas
> and Happy New Year.

And to you too!
> 
> bye,
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux