Re: RAID5: failing an active component during spare rebuild - arrays hangs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello Neil,
we are looking at Ubuntu-oneiric kernel 3.0.0-14.23.
We see that this fix was delivered to it by the following commit:
---------------------------------
commit 5669de653e363cfaf2a2c7c48ea224a730f5a7a9
Author: NeilBrown <neilb@xxxxxxx>
Date:   Wed Oct 26 10:31:04 2011 +1100

    md/raid5: fix bug that could result in reads from a failed device.

    BugLink: http://bugs.launchpad.net/bugs/890952

    commit 355840e7a7e56bb2834fd3b0da64da5465f8aeaa upstream.
------------------------------------
However, when looking at the diff, we see that only handle_stripe6()
function was fixed and not handle_stripe5(). That also explains why we
saw this issue on oneiric with raid5. Here is the diff:
----------------------------------------------------------
alex@ubuntu-alyakas-srv:/mnt/share/src/ubuntu-oneiric$ git diff
ccfe5df60a583cbad36969344679903585e2eac7
5669de653e363cfaf2a2c7c48ea224a730f5a7a9
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index 2581ba1..e509147 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -3369,7 +3369,7 @@ static void handle_stripe6(struct stripe_head *sh)
                        /* Not in-sync */;
                else if (test_bit(In_sync, &rdev->flags))
                        set_bit(R5_Insync, &dev->flags);
-               else {
+               else if (!test_bit(Faulty, &rdev->flags)) {
                        /* in sync if before recovery_offset */
                        if (sh->sector + STRIPE_SECTORS <=
rdev->recovery_offset)
                                set_bit(R5_Insync, &dev->flags);
-----------------------------------------------

What is the reason the fix for raid5 was not applied there? Should we
apply the same fix for raid5 as well manually?
Copying also other two persons signed on the commit.

Thanks,
  Alex.


On Tue, Dec 6, 2011 at 11:21 PM, NeilBrown <neilb@xxxxxxx> wrote:
> On Tue, 6 Dec 2011 23:07:53 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
> wrote:
>
>> Thanks, Neil!!!
>> Looks like this patch solves the issue. I applied it manually though,
>> for some reason git refused to apply it.
>>
>> Thanks again for great help,
>>   Alex.
>
> Great.  Thanks for the confirmation.
>
> NeilBrown
>
>
>>
>>
>> On Tue, Dec 6, 2011 at 5:16 AM, NeilBrown <neilb@xxxxxxx> wrote:
>> > On Sun, 27 Nov 2011 11:56:17 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx>
>> > wrote:
>> >
>> >> Hello Neil,
>> >> we have compiled the natty kernel with dynamic debugging enabled for
>> >> raid456, and reproduced the problem.
>> >> The kernel log is available at
>> >> https://docs.google.com/open?id=0B9rmyUifdvMLMzk1YjYwZDUtYzhhYi00MDRlLTkzYjItMDM0Y2ZhZmU3ZDRk
>> >>
>> >> Some more information:
>> >> - array was created at Nov 27 11:28:03
>> >> - manual drive failure was issued at 11:28:09
>> >>
>> >> Please let me know if you need any additional information.
>> >>
>> >
>> > Hi,
>> >  sorry for the long delay, I've had a lot of distractions this past week.
>> >
>> > I looks like you are hitting the bug fixed by upstream commit
>> >    355840e7a7e56bb2834fd3b0da64da5465f8aeaa
>> >
>> > The symptoms are slightly different to those described in that commit but I'm
>> > sure the root problem is the same.
>> >
>> > That patch doesn't apply to 2.6.38 though.
>> > Use this one.
>> >
>> > NeilBrown
>> >
>> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
>> > index 78536fd..8144126 100644
>> > --- a/drivers/md/raid5.c
>> > +++ b/drivers/md/raid5.c
>> > @@ -3086,7 +3086,7 @@ static void handle_stripe5(struct stripe_head *sh)
>> >                        /* Not in-sync */;
>> >                else if (test_bit(In_sync, &rdev->flags))
>> >                        set_bit(R5_Insync, &dev->flags);
>> > -               else {
>> > +               else if (!test_bit(Faulty, &rdev->flags)) {
>> >                        /* could be in-sync depending on recovery/reshape status */
>> >                        if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
>> >                                set_bit(R5_Insync, &dev->flags);
>> > @@ -3377,7 +3377,7 @@ static void handle_stripe6(struct stripe_head *sh)
>> >                        /* Not in-sync */;
>> >                else if (test_bit(In_sync, &rdev->flags))
>> >                        set_bit(R5_Insync, &dev->flags);
>> > -               else {
>> > +               else if (!test_bit(Faulty, &rdev->flags)) {
>> >                        /* in sync if before recovery_offset */
>> >                        if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset)
>> >                                set_bit(R5_Insync, &dev->flags);
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux RAID Wiki]     [ATA RAID]     [Linux SCSI Target Infrastructure]     [Linux Block]     [Linux IDE]     [Linux SCSI]     [Linux Hams]     [Device Mapper]     [Device Mapper Cryptographics]     [Kernel]     [Linux Admin]     [Linux Net]     [GFS]     [RPM]     [git]     [Yosemite Forum]


  Powered by Linux