On Mon, 21 Aug 2017, Adam Goryachev wrote:
data (even where it is wrong). So just do a check/repair which will
ensure both drives are consistent, then you can safely do the fsck.
(Assuming you fixed the problem causing random write errors first).
This involves manual intervention.
While I don't know how to implement this, let's at least see if we can
architect something for throwing ideas around.
What about having an option for any raid level that would do "repair on
read". So you can do "0" or "1" on this. RAID1 would mean it reads all
stripes and if there is inconsistency, pick one and write it to all of
them. It could also be some kind of IOCTL option I guess. For RAID5/6,
read all data drives, and check parity. If parity is wrong, write parity.
This could mean that if filesystem developers wanted to do repair (and
this could be a userspace option or mount option), it would use the
beforementioned option for all fsck-like operation to make sure that
metadata was consistent while doing fsck (this would be different for
different tools, if it's an "fs needs to be mounted"-type of fs, or if
it's an "offline fsck" type filesystem. Then it could go back to normal
operation for everything else that would hopefully not cause
catastrophical failures to the filesystem, but instead just individual
file corruption in case of mismatches.
--
Mikael Abrahamsson email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at http://vger.kernel.org/majordomo-info.html