Re: raid1 out of sync, but which files are affected?

"Nik.Brt." <nik.brt@xxxxxxxxxxxxx> · Sun, 27 Jan 2019 19:25:31 +0100

On 27/01/2019 00:44, Eyal Lebedinsky wrote:
On 27/1/19 10:21 am, Nik.Brt. wrote:
On 26/01/2019 11:49, Harald Dunkel wrote:
I initiated a check of my RAID1 (2 disks) this morning. mismatch_cnt
is at 128 by now.

AFAIR 128 is a rounded number, and you are not going to get it more 
precise than this.
It depends on the granularity of the check, which depends on the 
raid1 code.

There is no official way to do what you want.

Well, not exactly. What I did before the log messages were introduced 
was to run 'check'
operations in small sections. I had a script that read the whole array 
in 32 sections
by setting 'sync_min' and 'sync_max', then deal with the faulty 
section by dividing it
again into 32 sections etc.

Trying to do a 'check' of the whole array in very small sections takes 
a very long time,
hence the divide-and-conquer approach.

Ah this might work, good idea, I have not tried. Have you ever got it 
down to an 8 (which would be 4k, 1 page, the minimum addressable size)?

in raid1.c there is this code

    atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);

so it is incremented in steps of r1_bio->sectors which I think is much 
higher than 8. Maybe once I saw a value as small as 32 (that would be 16 
kbytes), that might be the rounding, not 128 like I said, but not 8 either.

What is the smallest number you have seen with this strategy?

Another way is to log /proc/mdstat for the position and 'mismatch_cnt' 
for the status,
at short intervals (I used 10 seconds) until all the mismatches are 
found. One can then
stop the check by writing 'idle' to 'sync_action'.

Well... it would go in steps of this size
    struct bio *bio = md_bio_alloc_sync(rdev->mddev);
which I have some problems finding how big is that, but I think is 
larger than 8 sectors = 1 page.

This gives a good idea of the location of the bad stripe(s). One can 
then do a fine
'check' around the location(s) of the mismatched to identify the exact 
location(s).

your refinement described above. Yes that one might work, but I'm not 
totally sure, that you can bring it down arbitrarily.

Just to test that your method can work, one can create a small file of 
random bytes, find its offset on both disks, change one byte with a dd 
directed on one disk only, and see if the mismatch_cnt is 8 or smaller 
or can be brought down to an 8 or smaller somehow with your 
divide-and-conquer strategy.