Re: Filesystem corruption on RAID1

Mikael Abrahamsson <swmike@xxxxxxxxx> · Sun, 20 Aug 2017 17:48:50 +0200 (CEST)

On Mon, 21 Aug 2017, Adam Goryachev wrote:

data (even where it is wrong). So just do a check/repair which will 
ensure both drives are consistent, then you can safely do the fsck. 
(Assuming you fixed the problem causing random write errors first).

This involves manual intervention.

While I don't know how to implement this, let's at least see if we can 
architect something for throwing ideas around.

What about having an option for any raid level that would do "repair on 
read". So you can do "0" or "1" on this. RAID1 would mean it reads all 
stripes and if there is inconsistency, pick one and write it to all of 
them. It could also be some kind of IOCTL option I guess. For RAID5/6, 
read all data drives, and check parity. If parity is wrong, write parity.

This could mean that if filesystem developers wanted to do repair (and 
this could be a userspace option or mount option), it would use the 
beforementioned option for all fsck-like operation to make sure that 
metadata was consistent while doing fsck (this would be different for 
different tools, if it's an "fs needs to be mounted"-type of fs, or if 
it's an "offline fsck" type filesystem. Then it could go back to normal 
operation for everything else that would hopefully not cause 
catastrophical failures to the filesystem, but instead just individual 
file corruption in case of mismatches.

--
Mikael Abrahamsson    email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html