Re: Suggestion needed for fixing RAID6

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 28 Apr 2010 04:02:39 +0200 (CEST)
Mikael Abrahamsson <swmike@xxxxxxxxx> wrote:

> On Wed, 28 Apr 2010, Neil Brown wrote:
> 
> >> I think I can see a problem here:
> >> You had 11 active devices over 12 when you received the read error.
> >> At 11 devices over 12 your array is singly-degraded and this should be
> >> enough for raid6 to recompute the block from parity and perform the
> >> rewrite, correcting the read-error, but instead MD declared that it's
> >> impossible to correct the error, and dropped one more device (going to
> >> doubly-degraded).
> >>
> >> I think this is an MD bug, and I think I know where it is:
> >>
> >>
> >> --- linux-2.6.33-vanilla/drivers/md/raid5.c     2010-02-24
> >> 19:52:17.000000000 +0100
> >> +++ linux-2.6.33/drivers/md/raid5.c     2010-04-27 23:58:31.000000000 +0200
> >> @@ -1526,7 +1526,7 @@ static void raid5_end_read_request(struc
> >>
> >>                  clear_bit(R5_UPTODATE, &sh->dev[i].flags);
> >>                  atomic_inc(&rdev->read_errors);
> >> -               if (conf->mddev->degraded)
> >> +               if (conf->mddev->degraded == conf->max_degraded)
> >>                          printk_rl(KERN_WARNING
> >>                                    "raid5:%s: read error not correctable "
> >>                                    "(sector %llu on %s).\n",
> >>
> >> ------------------------------------------------------
> >> (This is just compile-tested so try at your risk)
> >>
> >> I'd like to hear what Neil thinks of this...
> >
> > I think you've found a real bug - thanks.
> >
> > It would make the test '>=' rather than '==' as that is safer, otherwise I
> > agree.
> >
> >> -               if (conf->mddev->degraded)
> >> +               if (conf->mddev->degraded >= conf->max_degraded)
> 
> If a raid6 device handling can reach this code path, could I also point 
> out that the message says "raid5" and that this is confusing if it's 
> referring to a degraded raid6?
> 

You could....

There are lots of places that say "raid5" where it could apply to raid4
or raid6 as well.  Maybe I should change them all to 'raid456'...

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux