Re: nonzero mismatch_cnt with no earlier error

Neil Brown <neilb@xxxxxxx> · Mon, 26 Feb 2007 15:36:04 +1100

On Saturday February 24, eyal@xxxxxxxxxxxxxx wrote:
> But is this not a good opportunity to repair the bad stripe for a very
> low cost (no complete resync required)?

In this case, 'md' knew nothing about an error.  The SCSI layer
detected something and thought it had fixed it itself.  Nothing for md
to do.

> 
> At time of error we actually know which disk failed and can re-write
> it, something we do not know at resync time, so I assume we always
> write to the parity disk.

md only knows of a 'problem' if the lower level driver reports one.
If it reports a problem for a write request, md will fail the device.
If it reports a problem for a read request, md will try to over-write
correct data on the failed block. 
But if the driver doesn't report the failure, there is nothing md can
do.

When performing a check/repair md looks for consistencies and fixes
the 'arbitrarily'.  For raid5/6, it just 'corrects' the parity.  For
raid1/10, it chooses one block and over-writes the other(s) with it.

Mapping these corrections back to blocks in files in the filesystem is
extremely non-trivial.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html