On Oct 7, 2014, at 8:14 AM, Ethan Wilson <ethan.wilson@xxxxxxxxxxxxx> wrote: > On 04/10/2014 15:46, Dennis Grant wrote: >> Hello all. >> >> ... >> >> Even after multiple checks, repairs, and rebuilds, the arrays on the >> bigger drives (/ and /home) are showing insanely high mismatch_cnt >> values. This has me concerned. >> > > Dennis, > since nobody more knowledgeable replied, I will try. > > Some mismatches on raid1 have been there since always, and nobody ever deeply investigated what they were caused by, nor if they happen on unallocated filesystem space or on real live data. It seems that if LVM is between raid1 and the filesystem then they don't happen anymore, but again nobody is really sure of why. > > Recently some changes in the raid1 resync algorithm introduced some bugs that could possibly generate additional mismatches, but if you haven't had resyncs then I am not so sure if such bugs and their fixes are relevant. However the fixes are here: > https://www.kernel.org/pub/linux/kernel/v3.x/ChangeLog-3.14.20 > search for "raid". > > You might want to upgrade to kernel 3.14.20, which is probably not what your Ubuntu LTS has currently, then repair the arrays, then see if they grow again. > Note that you need to do repair and not check: > echo repair > /sys/block/md0/md/sync_action > at the next "check" the mismatch_cnt should be 0 (not just after "repair", because that would count the number of mismatches that have been repaired). > > I'd say that mismatches in general are pretty worrisome, they shouldn't happen, they are likely to indicate corruption, so if what I said doesn't work, e.g. mismatches grow again, try to report it again on the list and somebody might be able to help further to track down this problem. The mismatches count can be incremented during operations other than check and repair. I believe its behavior also varies between RAID personalities. However, if you check the ‘last_sync_action’ and see that it was a “check” operation, you are probably safe to assume that the mismatch count has been computed correctly. Note the following commit: commit c4a39551451666229b4ea5e8aae8ca0131d00665 Author: Jonathan Brassow <jbrassow@xxxxxxxxxx> Date: Tue Jun 25 01:23:59 2013 -0500 MD: Remember the last sync operation that was performed MD: Remember the last sync operation that was performed This patch adds a field to the mddev structure to track the last sync operation that was performed. This is especially useful when it comes to what is recorded in mismatch_cnt in sysfs. If the last operation was "data-check", then it reports the number of descrepancies found by the user-initiated check. If it was a "repair" operation, then it is reporting the number of descrepancies repaired. etc. Signed-off-by: Jonathan Brassow <jbrassow@xxxxxxxxxx> Signed-off-by: NeilBrown <neilb@xxxxxxx> Relatedly, LVM makes use of the MD RAID personalities to provide its RAID capabilities. It does this by accessing MD through a thin device-mapper target called "dm-raid” - not to be confused with the similarly named userspace application. The above mentioned commit contains a change to the dm-raid module as well, which causes it to report ‘0’ mismatches unless the ‘last_sync_action’ was a “check”. So, for dm-raid (and by extension LVM) the ambiguity in mismatch_count is gone, but the user must be careful when looking at the number for MD. brassow-- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html