Re: mismatch_cnt constantly goes up on ssd+hdd raid1

tlknv <tlknv@xxxxxxxxx> · Thu, 25 Jun 2015 18:33:16 +0300

Neil,
Thanks a lot for all the info and steps to identify the problem.

I have just discovered that I had 'discard' mount option even though I though it wasn't there :-(
After removing 'discard' and forcing 'repair' mismatch_cnt stays 0 even after a bunch of writes and deletes (the most importantly) to the partition. BTW, what are the units in mismatch_cnt? Is it 512 sectors or something else?
AFAIU md could potentially collect info on trimmed sectors/blocks and exclude them from mismatch checking. Could not it?

I'll look at the range of the sectors which are different even when mismatch_cnt is 0.

Thanks again,
Boris

25.06.2015, 10:25, "NeilBrown" <neilb@xxxxxxxx>:
>  On Thu, 25 Jun 2015 10:19:59 +0500 Roman Mamedov <rm@xxxxxxxxxxx> wrote:
>
>>   On Thu, 25 Jun 2015 11:33:35 +1000
>>   NeilBrown <neilb@xxxxxxxx> wrote:
>>
>>   > On Sun, 14 Jun 2015 20:13:16 +0300 tlknv <tlknv@xxxxxxxxx> wrote:
>>   >
>>   > > Hello,
>>   >
>>   > > I have raid 1 which mirrors a root/boot partition on 1SSD and 2HDD
>>   > > (write-mostly). mismatch_cnt goes up even when there are very few
>>   > > writes to the partition as /var is mounted separatly. After I update
>>   > > several packages I typically see mismatch_cnt somewhere between
>>   > > 500,000 and 2,000,000. I have read a number of threads in this DL
>>   > > but could not find an explanation of what could cause mismatch_cnt
>>   > > to grow that much. I checked md5 sums using
>>   > > /var/lib/dpkg/info/*.md5sums, and didn't see many errors, even
>>   > > though there are few, mostly in text files which look ok to me. I
>>   > > guess when I check, all reads go to SSD (as both HDDs in this raid
>>   > > are write-mostly), and thus md5sum only shows no problem on
>>   > > SSD. Note, this partition is used as both boot and root and just in
>>   > > case here is some more info about my system:
>>   >
>>   > This does surprise me.
>>   >
>>   > I had another look at the code and there could be a bug that would let
>>   > 'check' see the difference between when the first write completes and
>>   > when the write-behind writes complete, but you would need to run the
>>   > check while the install was happening for that to be noticed, and even
>>   > then you would need to be unlucky.
>>
>>   Couldn't this be simply the normal observed effect of using TRIM on SSD?
>
>  Yes, of course it could. I try not to think about TRIM to much - makes me ill :-)
>
>  Thanks,
>  NeilBrown
>
>>   After deleting some files, the filesystem issues a discard request, it
>>   does nothing to the HDDs, but the content of the discared areas on SSD is no
>>   longer deterministic (or mostly zeroed, as mentioned in the original report).
>>   So there is now a mismatch between the content of HDDs and SSD, but since it
>>   is in the area of deleted files, it doesn't affect the system in any way.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html