Re: Filesystem corruption on RAID1

Mikael Abrahamsson <swmike@xxxxxxxxx> · Sun, 20 Aug 2017 09:14:27 +0200 (CEST)

On Fri, 18 Aug 2017, Gionatan Danti wrote:

So while many (old) mismatch_cnt reports on RAID1/10 arrays where 
dismissed as "don't bother, it's a harmless RAID1 thing", I really think 
than some were genuine corruptions due to micro powerlosses and similar 
causes.

After a non-clean poweroff and possible mismatch now between the RAID1 
drives, and now fsck runs. It reads from the drives and fixes problem. 
However because the RAID1 drives contain different information, some of 
the errors are not fixed. Next time anything comes along, it might read 
from a different drive than what fsck read from, and now we have 
corruption.

Wouldn't it make sense for an option where fsck can do its reads and the 
md layer would run "repair" on all stripes that fsck touches? Whatever 
information is handed off to fsck, then parity is always checked (and 
repaired) if there is a mismatch.

The problem here with issuing a "repair" action is that it might actually 
copy data from the drive that fsck didn't read from, so now even though 
fsck thought it had made everything clean in the fs, it's no longer clean 
because md "repair" copied non-clean inforamation to the drive that fsck 
looked at and deemed to be ok?

--
Mikael Abrahamsson    email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html