Re: Why does one get mismatches?

Piergiorgio Sartor <piergiorgio.sartor@xxxxxxxx> · Thu, 11 Feb 2010 19:12:04 +0100

Hi all,

> > This whole discussion simply shows that for RAID-1 software RAID is less 
> > reliable than hardware RAID (no, I don't mean fake-RAID), because it 
> > doesn't pin the data buffer until all copies are written.
> > 
> 
> That doesn't make it less reliable.  It just makes it more confusing.

well, sorry to say, but it makes it useless.

The problem is: how can we be sure that the FS really
plays tricks only with blocks which will be unused?

In other words, either there should be an agreed and
confirmed interface between caller (FS) and called (MD),
handling the situation properly (i.e. the FS will not
do these pranks), or the called (MD) should be robust
agains all possible nasty things the caller (FS) can do.

Because what will happen if someone introduces a new
FS which works fine with all, but software RAID?

Similarly, I've some, identical, PCs, with RAID-10 f2.

Starting with Fedora 12, there is a weekly check of
the RAID array (with email notification, BTW without
mismatch count...).

On these PCs I get mismatches, sometimes.
Checking the mismatch count I found out that this is
changing, sometimes a bit more, sometimes a bit less (o zero).

Now, IMHO the check is completely useless and even annoying.

I've got mismatches, changing, but I do not know how
serious these are.

Not good... I could have lost data or not, and I do
not know...

> But for a more complete discussion on raid recovery and when it might be
> sensible to "vote" among the blocks, see
>    http://neil.brown.name/blog/20100211050355
> 

Nice, discussion.
Expecially the clarification about the unclean shutdown event.
This could be, in effect, a killer for the majority select
(or RAID-6 reconstrunction) decision.

I personally agree with the conclusion of your conclusion.
Anyway, I miss, or I did not get, one more point.

Specifically, the "smart recovery" should be composed by
two steps. One is detecting where the problems are.
This means not only the stripe, but, in case of RAID-6,
also the *potential* component (HDD) of the array.

Reason is that, as I already wrote some times ago,
there is a *huge* difference between having all the
mismatches *potentially* on one single component, or
spread around several.

The first case clearly gives more information and allows
a better judgment of the situation.

Thanks,

bye,

-- 

piergiorgio
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html