Re: Redundancy check using "echo check > sync_action": error reporting?

Bill Davidsen <davidsen@xxxxxxx> · Fri, 21 Mar 2008 10:24:04 -0400

Peter Rabbitson wrote:
Theodore Tso wrote:
On Thu, Mar 20, 2008 at 03:19:08PM +0100, Bas van Schaik wrote:
There's no explicit message produced by the md module, no.  You 
need to
check the /sys/block/md{X}/md/mismatch_cnt entry to find out how many
mismatches there are.  Similarly, following a repair this will 
indicate
how many mismatches it thinks have been fixed (by updating the parity
block to match the data blocks).

Marvellous! I naively assumed that the module would warn me, but that's
not true. Wouldn't it be appropriate to print a message to dmesg if 
such
a mismatch occurs during a check? Such a mismatch clearly means that
there is something wrong with your hardware lying beneath md, 
doesn't it?

If a mismatch is detected in a RAID-6 configuration, it should be
possible to figure out what should be fixed (since with two hot spares
there should be enough redundancy not only to detect an error, but to
correct it.)  Out of curiosity, does md do this automatically, either
when reading from a stripe, or during a resync operation?

In my modest experience with root/high performance spool on various 
raid levels I can pretty much conclude that the current check 
mechanism doesn't do enough to give power to the user. We can debate 
all we want about what the MD driver should do when it finds a 
mismatch, yet there is no way for the user to figure out what the 
mismatch is and take appropriate action. This does not apply only to 
RIAD5/6 - what about RAID1/10 with >2 chunk copies? What if the only 
wrong value is taken and written all over the other good blocks?

I think that the solution is rather simple, and I would contribute a 
patch if I had any C experience. The current check mechanism remains 
the same - mismatch_cnt is incremented/reset just the same as before. 
However on every mismatching chunk the system printks the following:

1) the start offset of the chunk(md1/10) or stripe(md5/6) within the 
MD device
2) one line for every active disk containing:
    a) the offset of the chunk within the MD componnent
    b) a {md5|sha1}sum of the chunk

In a common case array this will take no more than 8 lines in dmesg. 
However it will allow:

1) For a human to determine at a glance which disk holds a mismatching 
chunk in raid 1/10
2) Determine the same for raid 6 using a userspace tool which will 
calculate the parity for every possible permutation of chunks
3) using some external tools to determine which file might have been 
affected on the layered file system

Now of course the problem remains how to repair the array using the 
information obtained above. I think the best way would be to extend 
the syntax of repair itself, so that:

echo repair > .../sync_action would use the old heuristics

echo repair <mdoffset> <component N> > .../sync_action will update the 
chunk on drive N which corresponds to the chunk/stripe at mdoffset 
within the MD device, using the information from the other drives, and 
not the other way around as might happen with just a repair.

I totally agree, not doing the most likely to be correct thing seems to 
be the one argument for hardware raid. There are two case in which 
software can determine (a) if it is likely that there is a single bad 
block, and (b) what the correct value for that block is.

raid1 - more than one copy

 If there are multiple copies of the data, and N-1 agree, then it is 
more likely that the mismatched copy is the bad one, and should be 
rewritten with the data in the other copies. This is never less likely 
to be correct than selecting one copy at random and writing it over all 
others, so it can only be a help.

raid6 - assume and check

 Given an error in raid6, if the parity A appears correct and the 
parity B does not, assume that the non-matching parity is bad and 
regenerate.

 If neither parity appears correct, for each data block assume it is 
bad and recalculate a recovery value using A nd B parities. If the data 
pattern generated is the same for recovery using either parity, assume 
that the data is bad and rewrite.

 Again, this is more likely to be correct than assuming that both 
parities are wrong. Obviously if no "most likely" bad data or parity 
information can be identified then recalculating both parity blocks is 
the only way to "fix" the array, but it leaves undetectable bad data. I 
would like an option to do repairs using these two methods, which would 
give a high probability that whatever "fixes" were applied were actually 
recovering the correct data.

Yes, I know that errors like this are less common than pure hardware 
errors, does that justify something less than best practice during recovery?

--
Bill Davidsen <davidsen@xxxxxxx>
 "Woe unto the statesman who makes war without a reason that will still
 be valid when the war is over..." Otto von Bismark 

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html