Re: RAID5: failing an active component during spare rebuild - arrays hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks for confirming that, Neil. We will add the raid5 change manually.
Alex.


On Wed, Dec 14, 2011 at 1:32 PM, NeilBrown <neilb@xxxxxxx> wrote:
> On Wed, 14 Dec 2011 12:27:43 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
> wrote:
>
>> Hello Neil,
>> we are looking at Ubuntu-oneiric kernel 3.0.0-14.23.
>> We see that this fix was delivered to it by the following commit:
>> ---------------------------------
>> commit 5669de653e363cfaf2a2c7c48ea224a730f5a7a9
>> Author: NeilBrown <neilb@xxxxxxx>
>> Date:   Wed Oct 26 10:31:04 2011 +1100
>>
>>     md/raid5: fix bug that could result in reads from a failed device.
>>
>>     BugLink: http://bugs.launchpad.net/bugs/890952
>>
>>     commit 355840e7a7e56bb2834fd3b0da64da5465f8aeaa upstream.
>> ------------------------------------
>> However, when looking at the diff, we see that only handle_stripe6()
>> function was fixed and not handle_stripe5(). That also explains why we
>> saw this issue on oneiric with raid5. Here is the diff:
>> ----------------------------------------------------------
>> alex@ubuntu-alyakas-srv:/mnt/share/src/ubuntu-oneiric$ git diff
>> ccfe5df60a583cbad36969344679903585e2eac7
>> 5669de653e363cfaf2a2c7c48ea224a730f5a7a9
>> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> index 2581ba1..e509147 100644
>> --- a/drivers/md/raid5.c
>> +++ b/drivers/md/raid5.c
>> @@ -3369,7 +3369,7 @@ static void handle_stripe6(struct stripe_head *sh)
>>                         /* Not in-sync */;
>>                 else if (test_bit(In_sync, &rdev->flags))
>>                         set_bit(R5_Insync, &dev->flags);
>> -               else {
>> +               else if (!test_bit(Faulty, &rdev->flags)) {
>>                         /* in sync if before recovery_offset */
>>                         if (sh->sector + STRIPE_SECTORS <=
>> rdev->recovery_offset)
>>                                 set_bit(R5_Insync, &dev->flags);
>> -----------------------------------------------
>>
>> What is the reason the fix for raid5 was not applied there? Should we
>> apply the same fix for raid5 as well manually?
>> Copying also other two persons signed on the commit.
>
> Yes, I stuffed up when I back-ported the patch for -stable and missed the
> RAID5 bit I've been meaning to send and update to stable but haven't yet.
> Will do it in the morning - thanks for the reminder.
>
> NeilBrown
>
>
>>
>> Thanks,
>>   Alex.
>>
>>
>> On Tue, Dec 6, 2011 at 11:21 PM, NeilBrown <neilb@xxxxxxx> wrote:
>> > On Tue, 6 Dec 2011 23:07:53 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
>> > wrote:
>> >
>> >> Thanks, Neil!!!
>> >> Looks like this patch solves the issue. I applied it manually though,
>> >> for some reason git refused to apply it.
>> >>
>> >> Thanks again for great help,
>> >>   Alex.
>> >
>> > Great.  Thanks for the confirmation.
>> >
>> > NeilBrown
>> >
>> >
>> >>
>> >>
>> >> On Tue, Dec 6, 2011 at 5:16 AM, NeilBrown <neilb@xxxxxxx> wrote:
>> >> > On Sun, 27 Nov 2011 11:56:17 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
>> >> > wrote:
>> >> >
>> >> >> Hello Neil,
>> >> >> we have compiled the natty kernel with dynamic debugging enabled for
>> >> >> raid456, and reproduced the problem.
>> >> >> The kernel log is available at
>> >> >> https://docs.google.com/open?id=0B9rmyUifdvMLMzk1YjYwZDUtYzhhYi00MDRlLTkzYjItMDM0Y2ZhZmU3ZDRk
>> >> >>
>> >> >> Some more information:
>> >> >> - array was created at Nov 27 11:28:03
>> >> >> - manual drive failure was issued at 11:28:09
>> >> >>
>> >> >> Please let me know if you need any additional information.
>> >> >>
>> >> >
>> >> > Hi,
>> >> >  sorry for the long delay, I've had a lot of distractions this past week.
>> >> >
>> >> > I looks like you are hitting the bug fixed by upstream commit
>> >> >    355840e7a7e56bb2834fd3b0da64da5465f8aeaa
>> >> >
>> >> > The symptoms are slightly different to those described in that commit but I'm
>> >> > sure the root problem is the same.
>> >> >
>> >> > That patch doesn't apply to 2.6.38 though.
>> >> > Use this one.
>> >> >
>> >> > NeilBrown
>> >> >
>> >> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> >> > index 78536fd..8144126 100644
>> >> > --- a/drivers/md/raid5.c
>> >> > +++ b/drivers/md/raid5.c
>> >> > @@ -3086,7 +3086,7 @@ static void handle_stripe5(struct stripe_head *sh)
>> >> >                        /* Not in-sync */;
>> >> >                else if (test_bit(In_sync, &rdev->flags))
>> >> >                        set_bit(R5_Insync, &dev->flags);
>> >> > -               else {
>> >> > +               else if (!test_bit(Faulty, &rdev->flags)) {
>> >> >                        /* could be in-sync depending on recovery/reshape status */
>> >> >                        if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
>> >> >                                set_bit(R5_Insync, &dev->flags);
>> >> > @@ -3377,7 +3377,7 @@ static void handle_stripe6(struct stripe_head *sh)
>> >> >                        /* Not in-sync */;
>> >> >                else if (test_bit(In_sync, &rdev->flags))
>> >> >                        set_bit(R5_Insync, &dev->flags);
>> >> > -               else {
>> >> > +               else if (!test_bit(Faulty, &rdev->flags)) {
>> >> >                        /* in sync if before recovery_offset */
>> >> >                        if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
>> >> >                                set_bit(R5_Insync, &dev->flags);
>> >
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux