Re: RAID5: failing an active component during spare rebuild - arrays hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 14 Dec 2011 12:27:43 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
wrote:

> Hello Neil,
> we are looking at Ubuntu-oneiric kernel 3.0.0-14.23.
> We see that this fix was delivered to it by the following commit:
> ---------------------------------
> commit 5669de653e363cfaf2a2c7c48ea224a730f5a7a9
> Author: NeilBrown <neilb@xxxxxxx>
> Date:   Wed Oct 26 10:31:04 2011 +1100
> 
>     md/raid5: fix bug that could result in reads from a failed device.
> 
>     BugLink: http://bugs.launchpad.net/bugs/890952
> 
>     commit 355840e7a7e56bb2834fd3b0da64da5465f8aeaa upstream.
> ------------------------------------
> However, when looking at the diff, we see that only handle_stripe6()
> function was fixed and not handle_stripe5(). That also explains why we
> saw this issue on oneiric with raid5. Here is the diff:
> ----------------------------------------------------------
> alex@ubuntu-alyakas-srv:/mnt/share/src/ubuntu-oneiric$ git diff
> ccfe5df60a583cbad36969344679903585e2eac7
> 5669de653e363cfaf2a2c7c48ea224a730f5a7a9
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index 2581ba1..e509147 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -3369,7 +3369,7 @@ static void handle_stripe6(struct stripe_head *sh)
>                         /* Not in-sync */;
>                 else if (test_bit(In_sync, &rdev->flags))
>                         set_bit(R5_Insync, &dev->flags);
> -               else {
> +               else if (!test_bit(Faulty, &rdev->flags)) {
>                         /* in sync if before recovery_offset */
>                         if (sh->sector + STRIPE_SECTORS <=
> rdev->recovery_offset)
>                                 set_bit(R5_Insync, &dev->flags);
> -----------------------------------------------
> 
> What is the reason the fix for raid5 was not applied there? Should we
> apply the same fix for raid5 as well manually?
> Copying also other two persons signed on the commit.

Yes, I stuffed up when I back-ported the patch for -stable and missed the
RAID5 bit I've been meaning to send and update to stable but haven't yet.
Will do it in the morning - thanks for the reminder.

NeilBrown


> 
> Thanks,
>   Alex.
> 
> 
> On Tue, Dec 6, 2011 at 11:21 PM, NeilBrown <neilb@xxxxxxx> wrote:
> > On Tue, 6 Dec 2011 23:07:53 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
> > wrote:
> >
> >> Thanks, Neil!!!
> >> Looks like this patch solves the issue. I applied it manually though,
> >> for some reason git refused to apply it.
> >>
> >> Thanks again for great help,
> >>   Alex.
> >
> > Great.  Thanks for the confirmation.
> >
> > NeilBrown
> >
> >
> >>
> >>
> >> On Tue, Dec 6, 2011 at 5:16 AM, NeilBrown <neilb@xxxxxxx> wrote:
> >> > On Sun, 27 Nov 2011 11:56:17 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
> >> > wrote:
> >> >
> >> >> Hello Neil,
> >> >> we have compiled the natty kernel with dynamic debugging enabled for
> >> >> raid456, and reproduced the problem.
> >> >> The kernel log is available at
> >> >> https://docs.google.com/open?id=0B9rmyUifdvMLMzk1YjYwZDUtYzhhYi00MDRlLTkzYjItMDM0Y2ZhZmU3ZDRk
> >> >>
> >> >> Some more information:
> >> >> - array was created at Nov 27 11:28:03
> >> >> - manual drive failure was issued at 11:28:09
> >> >>
> >> >> Please let me know if you need any additional information.
> >> >>
> >> >
> >> > Hi,
> >> >  sorry for the long delay, I've had a lot of distractions this past week.
> >> >
> >> > I looks like you are hitting the bug fixed by upstream commit
> >> >    355840e7a7e56bb2834fd3b0da64da5465f8aeaa
> >> >
> >> > The symptoms are slightly different to those described in that commit but I'm
> >> > sure the root problem is the same.
> >> >
> >> > That patch doesn't apply to 2.6.38 though.
> >> > Use this one.
> >> >
> >> > NeilBrown
> >> >
> >> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> >> > index 78536fd..8144126 100644
> >> > --- a/drivers/md/raid5.c
> >> > +++ b/drivers/md/raid5.c
> >> > @@ -3086,7 +3086,7 @@ static void handle_stripe5(struct stripe_head *sh)
> >> >                        /* Not in-sync */;
> >> >                else if (test_bit(In_sync, &rdev->flags))
> >> >                        set_bit(R5_Insync, &dev->flags);
> >> > -               else {
> >> > +               else if (!test_bit(Faulty, &rdev->flags)) {
> >> >                        /* could be in-sync depending on recovery/reshape status */
> >> >                        if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
> >> >                                set_bit(R5_Insync, &dev->flags);
> >> > @@ -3377,7 +3377,7 @@ static void handle_stripe6(struct stripe_head *sh)
> >> >                        /* Not in-sync */;
> >> >                else if (test_bit(In_sync, &rdev->flags))
> >> >                        set_bit(R5_Insync, &dev->flags);
> >> > -               else {
> >> > +               else if (!test_bit(Faulty, &rdev->flags)) {
> >> >                        /* in sync if before recovery_offset */
> >> >                        if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
> >> >                                set_bit(R5_Insync, &dev->flags);
> >

Attachment: signature.asc
Description: PGP signature


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux