[ .. ] > Very bad news then. Mismatches belong to occupied filesystem space. Seems > like your data indeed got corrupted somehow and reading from different > drives probably returns different content for existing files. > > Most likely culprits that come to my mind: > > 1- MD raid1 bug > 2- SSD bug (what brand and model?) # smartctl -a /dev/sdb|grep -i model Model Family: Intel 520 Series SSDs Device Model: INTEL SSDSC2CW240A3 # smartctl -a /dev/sdc|grep -i model Model Family: Intel 520 Series SSDs Device Model: INTEL SSDSC2CW240A3 > 3- Loose SATA cable Confirmed this is not the case. > 4- Linux or SSD bug on trim, such as trimming wrong offsets killing live > data > 5- MD does not lock regions during check so returns erroneous mismatches for > areas being written. This would be harmless but your mismatches number seems > to high to me for this. I wonder if this could be it. > > I would suggest to investigate further. One idea is to find which files are > affected, then reading from both disks independently you should be able to > determine if all wrong data are on the same SSD (probable loose cable or SSD > bug if they are different) or evenly distributed (probable MD raid1 bug or > SSD bug if they are identical). > > The easiest, if it works, would be to determine the location of mismatches, > and then get the filename from there. > Unfortunately I don't think MD tells you the location of mismatches > directly. Do you want to try the following: > /sys/block/mdX/md/sync{_min,_max} should allow you to narrow the region of > the next check. Then check, then cat mismatch_cnt. > Narrow progressively so that you identify one block only. Invoke sync and > check again same region a couple of times so to be sure that it's not due to > point 5 above. Then try debugfs (in readonly mode can be used with fs > mounted), there should be an option to get the inode from the block > number... I hope that block numbers are not offset by MD... I think it's > icheck and then you might need find -inum to find the filename. > > Now it's better to inspect the file to confirm it has indeed different > content on the two sides... > > activate bitmap for raid1, preferably with small chunksize > fail 1 drive so to degrade raid1 > drop caches with blockdev --flushbufs on the md device such as /dev/md2, on > the two underlying partitions such as /dev/sd[ab]2, and maybe even on the > two disk holding then such as /dev/sd[ab] (I'm not really sure what is the > minimum needed) ; and also echo 3 > /proc/sys/vm/drop_caches > cp the file to another filesystem > reattach drive, let it resync the differences using the bitmap > fail the other drive > drop all caches again > cp again file to another filesystem > reattach drive and let it resync > > diff the two copied files... what do you see? > > BTW can your system be taken offline or is it a production system? If it can > be taken offline you can easily dump md5sums for all files from both sides > of the RAID, that would be quicker. I took a slightly different approach, hopefully this will provide the information you are looking for: Rebooted to a system rescue cd: Did not mount the filesystem, before a check: cat /sys/devices/virtual/block/md1/md/mismatch_cnt 256 Ran a check > sync_action and re-checked the mismatch_cnt: cat /sys/devices/virtual/block/md1/md/mismatch_cnt 68352 Ran a repair > sync_action 68352 (expected, need to re-run check): Ran a check > sync_action 0 It appears when there a files moving around / being written to it can throw off the mismatch_cnt? As the FS above was not mounted, it repaired ok? Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html