Re: data scrubbing

David Brown <david.brown@xxxxxxxxxxxx> · Sat, 30 Jul 2011 00:16:55 +0200

On 29/07/11 23:51, Mathias Burén wrote:
On 29 July 2011 21:48, Beolach<beolach@xxxxxxxxx>  wrote:
On Fri, Jul 29, 2011 at 07:25, Nikolay Kichukov<hijacker@xxxxxxxxx>  wrote:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

This is a good to know!

Just performed a check on a raid1 and got:

Jul 29 15:37:36 hanna64 mdadm[2277]: RebuildFinished event detected on md device /dev/md1, component device  mismatches
found: 128

So I presume those mismatches have now been rewritten to both disks successfully. Am I wrong there?

cat /sys/block/md1/md/mismatch_cnt
128

That depends on if you did a "check" or a "repair" - see the SCRUBBING
AND MISMATCHES section of the md(4) man page:
"If  check  was used, then no action is taken to handle the mismatch,
it is simply recorded.  If repair  was  used,  then  a  mismatch  will
  be repaired  in  the same way that resync repairs arrays."

Good luck,
Beolach
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Sorry to chime in like this. After reading the above, is there a
reason why anyone shouldn't _always_ use repair instead of check on a
weekly RAID6 check? You have to run repair anyway after a check if any
issues are found, right?

Or does the system become vulnerable during a repair? (less redundant)

Thanks,
Mathias

If you do a repair, then when a mismatch is found one of the disks is 
taken as the "bad" one, and re-created.  For raid1, the first copy is 
assumed correct.  For raid5/6, the data blocks are assumed correct and 
the parities re-created.  As Neil Brown explained on his blog, without 
any more information then this is as good as md raid can do.  However, 
it is not necessarily as good as /you/ can do.  For example, you might 
be able to determine which files use the blocks in the mismatched 
stripe, and figure out which block was bad.  Or for 3-disk raid1 you 
could pick the bad block as the odd one out (assuming the other two 
matched).  For raid6, it's possible to spot if it is a single-disk 
mismatch and correct that one disk (for each disk in turn, assume it is 
missing and re-create it from the other disks using normal raid6 
recovery.  If the stripe is then consistent, you've fixed the mismatch). 
 However, such approaches are not necessarily the correct one.  Thus 
the "repair" just does the simplest and fastest correction of the 
mismatch, and "check" does not change the stripe in case you want to 
manually pick a different method.

<http://neil.brown.name/blog/20100211050355>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html