Re: Suggestion needed for fixing RAID6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 28 Apr 2010, Neil Brown wrote:

I think I can see a problem here:
You had 11 active devices over 12 when you received the read error.
At 11 devices over 12 your array is singly-degraded and this should be
enough for raid6 to recompute the block from parity and perform the
rewrite, correcting the read-error, but instead MD declared that it's
impossible to correct the error, and dropped one more device (going to
doubly-degraded).

I think this is an MD bug, and I think I know where it is:


--- linux-2.6.33-vanilla/drivers/md/raid5.c     2010-02-24
19:52:17.000000000 +0100
+++ linux-2.6.33/drivers/md/raid5.c     2010-04-27 23:58:31.000000000 +0200
@@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc

                 clear_bit(R5_UPTODATE, &sh->dev[i].flags);
                 atomic_inc(&rdev->read_errors);
-               if (conf->mddev->degraded)
+               if (conf->mddev->degraded == conf->max_degraded)
                         printk_rl(KERN_WARNING
                                   "raid5:%s: read error not correctable "
                                   "(sector %llu on %s).\n",

------------------------------------------------------
(This is just compile-tested so try at your risk)

I'd like to hear what Neil thinks of this...

I think you've found a real bug - thanks.

It would make the test '>=' rather than '==' as that is safer, otherwise I
agree.

-               if (conf->mddev->degraded)
+               if (conf->mddev->degraded >= conf->max_degraded)

If a raid6 device handling can reach this code path, could I also point out that the message says "raid5" and that this is confusing if it's referring to a degraded raid6?

--
Mikael Abrahamsson    email: swmike@xxxxxxxxx
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux