Re: Suggestion needed for fixing RAID6

Mikael Abrahamsson <swmike@xxxxxxxxx> · Wed, 28 Apr 2010 04:02:39 +0200 (CEST)

On Wed, 28 Apr 2010, Neil Brown wrote:

I think I can see a problem here:
You had 11 active devices over 12 when you received the read error.
At 11 devices over 12 your array is singly-degraded and this should be
enough for raid6 to recompute the block from parity and perform the
rewrite, correcting the read-error, but instead MD declared that it's
impossible to correct the error, and dropped one more device (going to
doubly-degraded).

I think this is an MD bug, and I think I know where it is:

--- linux-2.6.33-vanilla/drivers/md/raid5.c     2010-02-24
19:52:17.000000000 +0100
+++ linux-2.6.33/drivers/md/raid5.c     2010-04-27 23:58:31.000000000 +0200
@@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc

                 clear_bit(R5_UPTODATE, &sh->dev[i].flags);
                 atomic_inc(&rdev->read_errors);
-               if (conf->mddev->degraded)
+               if (conf->mddev->degraded == conf->max_degraded)
                         printk_rl(KERN_WARNING
                                   "raid5:%s: read error not correctable "
                                   "(sector %llu on %s).\n",

------------------------------------------------------
(This is just compile-tested so try at your risk)

I'd like to hear what Neil thinks of this...

I think you've found a real bug - thanks.

It would make the test '>=' rather than '==' as that is safer, otherwise I
agree.

-               if (conf->mddev->degraded)
+               if (conf->mddev->degraded >= conf->max_degraded)

If a raid6 device handling can reach this code path, could I also point 
out that the message says "raid5" and that this is confusing if it's 
referring to a degraded raid6?

--
Mikael Abrahamsson    email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html