Re: <DKIM> Re: Paranoid mode for RAID-1 ?

David Brown <david.brown@xxxxxxxxxxxx> · Mon, 27 Apr 2015 12:54:17 +0200

On 27/04/15 12:18, Jean-Baptiste Thomas wrote:
> On 2015-04-27 20:45 +1200, Pieter De Wit wrote:
> 
>> Sorry for jumping in late - but let's say it does "work" and a
>> drive returns an error, is that data lost ? Or which drive is
>> "right"?
> 
> (Assuming that by "returns an error", you mean succeeds but the
> data does not no match what the other(s) returned.)

The alternative interpretation here is that the drive returns an error
message saying it couldn't read the sector - then it's just standard
RAID (get the data from the other disks).  So we are looking here at the
extremely rare situation where there is an error but the drive (or
controller) does not detect it.

> 
> Let's say there is a setting for how many components must agree.
> If they're not unanimous, read all the other components and look
> for a majority. The components in the minority are flagged
> faulty and the array is degraded but the read succeeds.
> 
> If there is no majority, retry a few times. If a majority is
> found, all components which ever were in the minority are
> flagged faulty and the array is degraded but the read succeeds.
> 
> If no majority is found, degrade all components, fail the read
> and stop the array. Or whatever is needed to prevent all further 
> writes to this array and let the user investigate.

The problem with all of these is that they /might/ be right - but they
/might/ be wrong and make matters worse.  Even if you have 3 copies of
the sector, and get two matches and one different, there is no way to
determine that the odd one is wrong.  Perhaps a common bus or connector
fault caused the other two to be wrong.  Picking the "majority vote" may
decrease your chances of losing data (but may not - it depends on the
cause of the fault), but it certainly does not avoid the worst case
scenario.  Perhaps the best choice during normal usage (as distinct from
recovery or rebuild, when the drive is not mounted) is to simply report
a failure to the layers higher up - that way you won't make matters
worse by giving returning data.

Note that the checksum method (used by btrfs and zfs) is different in
that it lets the system know exactly which copy was bad even if the
drive (and bus and controller) think it was good.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html