Re: md road-map: 2011

Phil Turmel <philip@xxxxxxxxxx> · Thu, 17 Feb 2011 13:46:58 -0500

On 02/16/2011 10:10 PM, NeilBrown wrote:
> On Wed, 16 Feb 2011 20:14:50 -0500 Phil Turmel <philip@xxxxxxxxxx> wrote:
> 
>> On 02/16/2011 07:52 PM, NeilBrown wrote:
> 
>>> So when you do the computation on all of the bytes in all of the blocks you
>>> get a block full of answers.
>>> If the answers are all the same - that tells you something fairly strong.
>>> If they are a "all different" then that is also a fairly strong statement.
>>> But what if most are the same, but a few are different?  How do you interpret
>>> that?
>>
>> Actually, I was thinking about that.  (You suckered me into reading that PDF
>> some weeks ago.)  I would be inclined to allow the kernel to make corrections
>> where "all the same" covers individual sectors, per the sector size reported
>> by the underlying device.
> 
> To see what I am strongly against having the kernel make automatic
> corrections like this, see
> 
>     http://neil.brown.name/blog/20100211050355

I read it, and slept on it, and my gut wants to argue.  But I have no data to
back me up.  I think I'll take a stab at reporting inconsistencies via simple
printk with a sysfs on/off switch.

>> Also, the comparison would have to ignore "neutral bytes", where P & Q
>> happened to be correct for that byte position.
>>
>>> The point I'm trying to get to is that the result of this RAID6 calculation
>>> isn't a simple "that device is bad".  It is a block of data that needs to be
>>> interpreted.
>>>
>>> I'd rather have user-space do that interpretation, so it may as well do the
>>> calculation too.
>>>
>>> If you wanted to do it in the kernel, you would need to be very clear about
>>> what information you provide, what it means exactly, and why it is sufficient.
>>
>> Given that the hardware is going to do error correction and checking at a
>> sector size granularity, and the kernel would in fact rewrite that sector using
>> this calculation if the hardware made a "fairly strong" statement that it can't
>> be trusted, I'd argue that rewriting the sector is appropriate.
> 
> You the RAID6 calculation tells you is that something cannot be trusted.  It
> doesn't tell you what.  It could be the controller, the cable, the drive
> logic, or the rust on the media.  Without the knowledge, correction can be
> dangerous.

True, but inconsistent data is also dangerous, as traffic on this list shows.  The
question is, "When is it safer to correct than to leave alone?"  I don't think
there's enough data to answer that, unless you have some pointers to studies that
address it.

Either way, a reporting method is needed, and might give us some numbers to work
with.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html