Re: raid1 out of sync, but which files are affected?

Andreas Klauer <Andreas.Klauer@xxxxxxxxxxxxxx> · Sat, 26 Jan 2019 19:21:30 +0100

On Sat, Jan 26, 2019 at 11:49:13AM +0100, Harald Dunkel wrote:
> I initiated a check of my RAID1 (2 disks) this morning. mismatch_cnt
> is at 128 by now.

HDD or SSD? If SSD, same model, firmware, partition offsets, ...? 
For SSD, TRIM/discard is a possible cause of [harmless] mismatches. 
It depends what each SSD does when told to TRIM.

(It's also possible for Linux cache to return old data after TRIM.
Basically you're not expected to try to read what was discarded.)

> how can I tell which blocks within md0 are affected?

The most intrusive method would be to split the RAID in two, 
and then mount each side separately, and compare files. 
But you probably don't want to do that...

To get sector numbers (byte offsets), if not already logged elsewhere, 
you could create a readonly loop device on each member drive with the 
appropriate data offset, then `cmp -l` them. It's possible for false 
mismatches to appear if anything writes new data while `cmp` runs.

For an online file-by-file comparison, you could make use of the raid1 
specific write-mostly flag. Set one disk write-mostly, drop caches, 
checksum all files (now read by non-write-mostly disk exclusively), 
unset write-mostly.

Repeat for the other disk. Then compare hashes.

If you find a hash that does not match (and the file hasn't been 
written to, so it can't be explained), you can copy the entire 
file with the same method, and do a more detailed comparison.

Regards
Andreas Klauer