Re: raid1 out of sync, but which files are affected?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 27/01/2019 00:44, Eyal Lebedinsky wrote:
On 27/1/19 10:21 am, Nik.Brt. wrote:
On 26/01/2019 11:49, Harald Dunkel wrote:
I initiated a check of my RAID1 (2 disks) this morning. mismatch_cnt
is at 128 by now.

AFAIR 128 is a rounded number, and you are not going to get it more precise than this. It depends on the granularity of the check, which depends on the raid1 code.

There is no official way to do what you want.

Well, not exactly. What I did before the log messages were introduced was to run 'check' operations in small sections. I had a script that read the whole array in 32 sections by setting 'sync_min' and 'sync_max', then deal with the faulty section by dividing it
again into 32 sections etc.

Trying to do a 'check' of the whole array in very small sections takes a very long time,
hence the divide-and-conquer approach.

Ah this might work, good idea, I have not tried. Have you ever got it down to an 8 (which would be 4k, 1 page, the minimum addressable size)?

in raid1.c there is this code

    atomic64_add(r1_bio->sectors, &mddev->resync_mismatches);

so it is incremented in steps of r1_bio->sectors which I think is much higher than 8. Maybe once I saw a value as small as 32 (that would be 16 kbytes), that might be the rounding, not 128 like I said, but not 8 either.

What is the smallest number you have seen with this strategy?

Another way is to log /proc/mdstat for the position and 'mismatch_cnt' for the status, at short intervals (I used 10 seconds) until all the mismatches are found. One can then
stop the check by writing 'idle' to 'sync_action'.

Well... it would go in steps of this size
    struct bio *bio = md_bio_alloc_sync(rdev->mddev);
which I have some problems finding how big is that, but I think is larger than 8 sectors = 1 page.

This gives a good idea of the location of the bad stripe(s). One can then do a fine 'check' around the location(s) of the mismatched to identify the exact location(s).


your refinement described above. Yes that one might work, but I'm not totally sure, that you can bring it down arbitrarily.

Just to test that your method can work, one can create a small file of random bytes, find its offset on both disks, change one byte with a dd directed on one disk only, and see if the mismatch_cnt is 8 or smaller or can be brought down to an 8 or smaller somehow with your divide-and-conquer strategy.




[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux