Re: Filesystem corruption on RAID1

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, 20 Aug 2017, Chris Murphy wrote:

Since md doesn't read from both mirrors, it's possible there's a read from a non-corrupt drive, which presents good information to fsck, which then sees no reason to fix anything in that block; but the other mirror does have corruption which thus goes undetected.

That was exactly what I wrote.

One way of dealing with it is to scrub (repair) so they both have the same information to hand over to fsck. Fixups then get replicated to disks by md.

Yes, it is, but that would require a full repair before doing fsck. That seems excessive because that will take hours on larger drives.

Another way is to split the mirror (make one device faulty), and then
fix the remaining drive (now degraded). If that goes well, the 2nd
device can be re-added. Here's a caveat thought: how it resync's will
depend on the write-intent bitmap being present. I have no idea if
write-intent bitmaps on two drives can get out of sync and what the
ensuing behavior is, but I'd like to think md will discover the fixed
drive event count is higher than the re-added one, and if necessary
does a full resync, rather than possibly re-introducing any
corruption.

This doesn't solve the problem because it doesn't check if the second mirror is out of sync with the first one, because it'll only detect writes to the degraded array and sync those. It doesn't fix the "fsck read the block and it was fine, but on the second drive it's not fine".

In that case fsck would have to be modified to write all blocks it read to make them dirty, so they're sync:ed.

However, this again causes the problem that if there is an URE on the degraded array remaining drive, things will fail.

The only way to solve this is to add more code to implement a new mode which would be "repair-on-read".

I understand that we can't necessarily detect which drive has the right or wrong information, but at least we can this way make sure that when fsck is done, all the inodes and other metadata is now consistent. Everything that fsck touched during the fsck will be consistent across all drives, with correct parity. It might not contain the "best" information that could have been presented by a more intelligent algorithm/metadata, but at least it's better than today when after a fsck run you don't know if parity is correct or not.

It would also be a good diagnostic tool for admins. If you suspect that you're getting inconsistencies but you're fine with the performance degradation then md could log inconsistencies somewhere so you know about them.

--
Mikael Abrahamsson    email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux