Re: using the raid6check report

NeilBrown <neilb@xxxxxxxx> · Mon, 09 Jan 2017 09:39:14 +1100

On Mon, Jan 09 2017, Piergiorgio Sartor wrote:

> On Sun, Jan 08, 2017 at 08:52:40PM +0000, Wols Lists wrote:
>> On 08/01/17 17:40, Piergiorgio Sartor wrote:
>> > On Fri, Dec 23, 2016 at 11:56:34AM +1100, Eyal Lebedinsky wrote:
>> >> > From time to time I get non-zero mismatch_count in the weekly scrub. The way I handle
>> >> > it is to run a check around the stripe (I have a background job printing the mismatch
>> >> > count and /proc/mdstat regularly) which should report the same count.
>> >> > 
>> >> > I now drill into the fs to find which files use this area, deal with them and delete
>> >> > the bad ones. I then run a repair on that small area.
>> >> > 
>> >> > I now found about raid6check which can actually tell me which disk holds the bad data.
>> >> > This is something raid6 should be able to do assuming a single error.
>> >> > Hoping it is one bad disk, the simple solution now is to recover the bad stripe on
>> >> > that disk.
>> >> > 
>> >> > Will a 'repair' rewrite the bad disk or just create fresh P+Q which may just make the
>> >> > bad data invisible to a 'check'? I recall this being the case in the past.
>> > "repair" should fix the data which is assumed
>> > to be wrong.
>> > It should not simply correct P+Q, but really
>> > find out which disk is not OK and fix it.
>> > 
>> Having just looked at the man page and the source to raid6check as found
>> online ...
>> 
>> "man raid6check" says that it does not write to the disk. Looking at the
>> source, it appears to have code that is intended to write to the disk
>> and repair the stripe. So what's going on?
>
> There was a patch adding the write capability,
> but likely only for the C code, not the man page.
>
>> 
>> I can add it to the wiki as a little programming project, but it would
>> be nice to know the exact status of things - my raid-fu isn't good
>> enough at present to read the code and work out what's going on.
>> 
>> It would be nice to be able to write "parity-check" or somesuch to
>> sync_action, and then for raid5 it would check and update parity, or
>> raid6 it would check and correct data/parity.
>
> At that time, the agreement with Neil was to do
> such things in user space and not inside the
> md raid "driver" (so to speak) in kernal space.

This is correct.

With RAID6 it is possible to determine, with high reliability, if a
single device is corrupt.  There is a mathematical function that can be
calculated over a set of bytes, one from each device.  If the result is
a number less than the number of devices in the array (including P and
Q), then the device with that index number is corrupt (or at least, both
P and Q can be made correct again by simply changing that one byte).  If
we compute that function over all 512 (or 4096) bytes in a stripe and
they all report the same device (or report that there are no errors for
some bytes) then it is reasonable to assume the block on the identified
device is corrupt.

raid6check does this and provides very useful functionality for a
sysadmin to determine which device is corrupt, and to then correct that
if they wish.

However, I am not comfortable with having that be done transparently
without any confirmation from the sysadmin.  This is because I don't
have a credible threat model for how the corruption could have happened
in the first place.  I understand how hardware failure can make a whole
device unaccessible, and how media errors can cause a single block to be
unreadable.  But I don't see a "most likely way" that a single block can
become corrupt.

Without a clear model, I cannot determine what the correct response is.
The corruption might have happened on the write path ... so re-writing
the block could just cause more corruption.  It could have happened on
the read path, so re-writing won't change anything.  It could have
happened in memory, so nothing can be trusted.  It could have happened
due to buggy code.  Without knowing the cause with high probability, it
is not safe to try to fix anything.

The most likely cause for incorrect P and Q is if the machine crashed
which a stipe was being updated. In that case, simply updating P and Q
is the correct response.  So that is the only response that the kernel
performs.

For more reading, see http://neil.brown.name/blog/20100211050355

NeilBrown

>
> So, as far as I know, the kernel md code can
> check the parity and, possibily, re-write.
>
> "raid6check" can detect errors *and*, if only one,
> where it is, so a "data repair" capability is possible.
>
> bye,
>
> pg
>  
>> Cheers,
>> Wol
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> -- 
>
> piergiorgio
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
Attachment:
signature.asc

Description: PGP signature