From: joystick [mailto:joystick@xxxxxxxxxxxxx] [ .. ] Hi, > 1) It might be Grub writing state data to one device only during boot. IF the machine was rebooted at least once prior to check. The checks (multiple) had occurred after the reboot, last uptime (was ~40+ days)-- also using LILO here with the checks running once a week. > 2) Earlier discussions on this list suggested that it might be a write buffer becoming invalid during write because a temporary file being written > has been deleted in the meantime and the buffer reused with different content even if the buffer was still in-flight for the write. If this is > true, the region with mismatches would belong to unallocated space on the filesystem so would be harmless. To confirm this, one in your > situation should write zeroes to a new file so to fill the filesystem, then remove the file, just prior to the check or repair > dd if=/dev/zero of=emptyfile bs=1M ; rm emptyfile ; echo check > ......... > this should result in zero or near-zero (see next point) mismatches. I think nobody has tried this before so if you can try this that would be > great. Baseline (had run a repair 9+ hours earlier btw): # echo "Before: " $(cat /sys/block/md{0,1}/md/mismatch_cnt) Before: 0 7552 # dd if=/dev/zero of=emptyfile bs=1M dd: error writing 'emptyfile': No space left on device 66180+0 records in 66179+0 records out 69394198528 bytes (69 GB) copied, 127.136 s, 546 MB/s # rm emptyfile # echo check > /sys/devices/virtual/block/md0/md/sync_action # echo check > /sys/devices/virtual/block/md1/md/sync_action # # .. waiting until check done .. # echo "After: " $(cat /sys/block/md{0,1}/md/mismatch_cnt) After: 0 6016 > 3) I'm not sure if a small number of mismatches can arise when check or repair reads a sector that is being written to. This cannot account for > the large number you see but could return not exactly zero when you do the test of previous point. Agree (there are some processes, logging, etc. to the RAID-1 on occasion but when I used to use HDDs in a similar configuration, I never saw this level of mismatches and a repair would usually bring it down to 0 or a very small number. > 4) Theories above do not explain why you see an improvement dropping caches. This is very interesting. How do you exactly drop the caches? In short: 1. sync 2. echo 1 > /proc/sys/vm/drop_caches 3. sync 4. echo check > sync_action [ .. ] 5. if mismatch_cnt > 0 6. repeat 1-3 above 7. echo repair > sync_action > 5) I have an additional theory for SSDs: do you have TRIMs enabled in mount options, or do you perform periodic TRIMs? If yes, note that the > SSD might return whatever from the sectors being TRIMmed, and hence the mismatch. See this: > http://serverfault.com/questions/530652/background-discard-on-swap-partition s-on-linux-ssd > do you have trim option enabled? do your SSDs have deterministic read data after trim? I have TRIM (discard) enabled for the / (root) only and only use MDRAID-1 for the /boot and / (root) filesystems, I have a 3rd SSD dedicated to swap. (/dev/sdb, /dev/sdc): /dev/md0 /boot ext3 defaults 0 0 /dev/md1 / ext4 defaults,discard 0 0 (/dev/sdd) /dev/sdd1 none swap sw 0 0 Justin. -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html