Thanks for confirming that, Neil. We will add the raid5 change manually. Alex. On Wed, Dec 14, 2011 at 1:32 PM, NeilBrown <neilb@xxxxxxx> wrote: > On Wed, 14 Dec 2011 12:27:43 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> > wrote: > >> Hello Neil, >> we are looking at Ubuntu-oneiric kernel 3.0.0-14.23. >> We see that this fix was delivered to it by the following commit: >> --------------------------------- >> commit 5669de653e363cfaf2a2c7c48ea224a730f5a7a9 >> Author: NeilBrown <neilb@xxxxxxx> >> Date: Wed Oct 26 10:31:04 2011 +1100 >> >> md/raid5: fix bug that could result in reads from a failed device. >> >> BugLink: http://bugs.launchpad.net/bugs/890952 >> >> commit 355840e7a7e56bb2834fd3b0da64da5465f8aeaa upstream. >> ------------------------------------ >> However, when looking at the diff, we see that only handle_stripe6() >> function was fixed and not handle_stripe5(). That also explains why we >> saw this issue on oneiric with raid5. Here is the diff: >> ---------------------------------------------------------- >> alex@ubuntu-alyakas-srv:/mnt/share/src/ubuntu-oneiric$ git diff >> ccfe5df60a583cbad36969344679903585e2eac7 >> 5669de653e363cfaf2a2c7c48ea224a730f5a7a9 >> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >> index 2581ba1..e509147 100644 >> --- a/drivers/md/raid5.c >> +++ b/drivers/md/raid5.c >> @@ -3369,7 +3369,7 @@ static void handle_stripe6(struct stripe_head *sh) >> /* Not in-sync */; >> else if (test_bit(In_sync, &rdev->flags)) >> set_bit(R5_Insync, &dev->flags); >> - else { >> + else if (!test_bit(Faulty, &rdev->flags)) { >> /* in sync if before recovery_offset */ >> if (sh->sector + STRIPE_SECTORS <= >> rdev->recovery_offset) >> set_bit(R5_Insync, &dev->flags); >> ----------------------------------------------- >> >> What is the reason the fix for raid5 was not applied there? Should we >> apply the same fix for raid5 as well manually? >> Copying also other two persons signed on the commit. > > Yes, I stuffed up when I back-ported the patch for -stable and missed the > RAID5 bit I've been meaning to send and update to stable but haven't yet. > Will do it in the morning - thanks for the reminder. > > NeilBrown > > >> >> Thanks, >> Alex. >> >> >> On Tue, Dec 6, 2011 at 11:21 PM, NeilBrown <neilb@xxxxxxx> wrote: >> > On Tue, 6 Dec 2011 23:07:53 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> >> > wrote: >> > >> >> Thanks, Neil!!! >> >> Looks like this patch solves the issue. I applied it manually though, >> >> for some reason git refused to apply it. >> >> >> >> Thanks again for great help, >> >> Alex. >> > >> > Great. Thanks for the confirmation. >> > >> > NeilBrown >> > >> > >> >> >> >> >> >> On Tue, Dec 6, 2011 at 5:16 AM, NeilBrown <neilb@xxxxxxx> wrote: >> >> > On Sun, 27 Nov 2011 11:56:17 +0200 Alexander Lyakas <alex.bolshoy@xxxxxxxxx> >> >> > wrote: >> >> > >> >> >> Hello Neil, >> >> >> we have compiled the natty kernel with dynamic debugging enabled for >> >> >> raid456, and reproduced the problem. >> >> >> The kernel log is available at >> >> >> https://docs.google.com/open?id=0B9rmyUifdvMLMzk1YjYwZDUtYzhhYi00MDRlLTkzYjItMDM0Y2ZhZmU3ZDRk >> >> >> >> >> >> Some more information: >> >> >> - array was created at Nov 27 11:28:03 >> >> >> - manual drive failure was issued at 11:28:09 >> >> >> >> >> >> Please let me know if you need any additional information. >> >> >> >> >> > >> >> > Hi, >> >> > sorry for the long delay, I've had a lot of distractions this past week. >> >> > >> >> > I looks like you are hitting the bug fixed by upstream commit >> >> > 355840e7a7e56bb2834fd3b0da64da5465f8aeaa >> >> > >> >> > The symptoms are slightly different to those described in that commit but I'm >> >> > sure the root problem is the same. >> >> > >> >> > That patch doesn't apply to 2.6.38 though. >> >> > Use this one. >> >> > >> >> > NeilBrown >> >> > >> >> > diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c >> >> > index 78536fd..8144126 100644 >> >> > --- a/drivers/md/raid5.c >> >> > +++ b/drivers/md/raid5.c >> >> > @@ -3086,7 +3086,7 @@ static void handle_stripe5(struct stripe_head *sh) >> >> > /* Not in-sync */; >> >> > else if (test_bit(In_sync, &rdev->flags)) >> >> > set_bit(R5_Insync, &dev->flags); >> >> > - else { >> >> > + else if (!test_bit(Faulty, &rdev->flags)) { >> >> > /* could be in-sync depending on recovery/reshape status */ >> >> > if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset) >> >> > set_bit(R5_Insync, &dev->flags); >> >> > @@ -3377,7 +3377,7 @@ static void handle_stripe6(struct stripe_head *sh) >> >> > /* Not in-sync */; >> >> > else if (test_bit(In_sync, &rdev->flags)) >> >> > set_bit(R5_Insync, &dev->flags); >> >> > - else { >> >> > + else if (!test_bit(Faulty, &rdev->flags)) { >> >> > /* in sync if before recovery_offset */ >> >> > if (sh->sector + STRIPE_SECTORS <= rdev->recovery_offset) >> >> > set_bit(R5_Insync, &dev->flags); >> > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html