Re: Filesystem corruption on RAID1

Wols Lists <antlists@xxxxxxxxxxxxxxx> · Sun, 20 Aug 2017 17:10:20 +0100

On 20/08/17 16:48, Mikael Abrahamsson wrote:
> On Mon, 21 Aug 2017, Adam Goryachev wrote:
> 
>> data (even where it is wrong). So just do a check/repair which will
>> ensure both drives are consistent, then you can safely do the fsck.
>> (Assuming you fixed the problem causing random write errors first).
> 
> This involves manual intervention.
> 
> While I don't know how to implement this, let's at least see if we can
> architect something for throwing ideas around.
> 
> What about having an option for any raid level that would do "repair on
> read". So you can do "0" or "1" on this. RAID1 would mean it reads all
> stripes and if there is inconsistency, pick one and write it to all of
> them. It could also be some kind of IOCTL option I guess. For RAID5/6,
> read all data drives, and check parity. If parity is wrong, write parity.
> 
> This could mean that if filesystem developers wanted to do repair (and
> this could be a userspace option or mount option), it would use the
> beforementioned option for all fsck-like operation to make sure that
> metadata was consistent while doing fsck (this would be different for
> different tools, if it's an "fs needs to be mounted"-type of fs, or if
> it's an "offline fsck" type filesystem. Then it could go back to normal
> operation for everything else that would hopefully not cause
> catastrophical failures to the filesystem, but instead just individual
> file corruption in case of mismatches.
> 
Look for the thread "RFC Raid error detection and auto-recovery, 10th May.

Basically, that proposed a three-way flag - "default" is the current
"read the data section", "check" would read the entire stripe and
compare a mirror or calculate parity on a raid and return a read error
if it couldn't work out the correct data, and "fix" would write the
correct data back if it could work it out.

So basically, on a two-disk raid-1, or raid 4 or 5, both "check" and
"fix" would return read errors if there's a problem and you're SOL
without a backup.

With a three-disk or more raid-1, or raid-6, it would return the correct
data (and fix the stripe) if it could, otherwise again you're SOL.

Cheers,
Wol
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html