Re: raid5, 2 drives dead at same time,kernel will Oops?

"3tcdgwg3" <3tcdgwg3@prodigy.net> · Fri, 30 May 2003 13:33:03 -0700

Hi,

I have some other issues under this "more than 1
arm broken in a raid5 array" condition. The next
important one is this:

If I have two arrays, if first one is a raid5 and resync
is going on, second is another raid5, resync is schedule
to start after the first raid5 array synced.
At this time, if I kill to arms in the first raid5 array, the resync
will stop, but never aborted, consequently, second raid5
array never get a chance to start the resync.
Is there a fix for this?

Thanks

-W

----- Original Message -----
From: "3tcdgwg3" <3tcdgwg3@prodigy.net>
To: "Neil Brown" <neilb@cse.unsw.edu.au>
Cc: <linux-raid@vger.kernel.org>
Sent: Wednesday, May 21, 2003 6:04 PM
Subject: Re: raid5, 2 drives dead at same time,kernel will Oops?

> Neil,
> Preliminary test looks good, will test more
> when have time.
>
> Thanks,
> -Will.
> ----- Original Message -----
> From: "Neil Brown" <neilb@cse.unsw.edu.au>
> To: "3tcdgwg3" <3tcdgwg3@prodigy.net>
> Cc: <linux-raid@vger.kernel.org>
> Sent: Tuesday, May 20, 2003 7:42 PM
> Subject: Re: raid5, 2 drives dead at same time,kernel will Oops?
>
>
> > On Monday May 19, 3tcdgwg3@prodigy.net wrote:
> > > Hi,
> > >
> > > I am trying to simulate a case that two drives
> > > in an array fail ad same time.
> > > I use two ide drives, I try to create a
> > > raid 5 array with 4 arms, created as following:
> > >
> > > /dev/hdc1
> > > /dev/hde1
> > > /dev/hdc2
> > > /dev/hde2
> > >
> > > This is just for test, I know create two arms on
> > > one hard drive doesn't make much sense.
> > >
> > >
> > > Anyway, when I run this array, if I power off one
> > > of hard drive (/dev/hde) to simulate two arms failing
> > > at same  time in an array, I got system Oops. I am using
> > > 2.4-18 kernel.
> > >
> > > Anyone can tell me if this is normal? or if there is a fix for this?
> > >
> >
> > Congratulations and thanks.  You have managed to trigger a bug that
> > no-one else has found.
> >
> > The following patch (against 2.4.20) should fix it.  If you can test
> > and confirm I would really appreciate it.
> >
> > NeilBrown
> >
> >
> > ------------------------------------------------------------
> > Handle concurrent failure of two drives in raid5
> >
> > If two drives both fail during a write request, raid5 doesn't
> > cope properly and will eventually oops.
> >
> > With this patch, blocks that have already been 'written'
> > are failed when double drive failure is noticed, as well as
> > blocks that are about to be written.
> >
> >  ----------- Diffstat output ------------
> >  ./drivers/md/raid5.c |   10 +++++++++-
> >  1 files changed, 9 insertions(+), 1 deletion(-)
> >
> > diff ./drivers/md/raid5.c~current~ ./drivers/md/raid5.c
> > --- ./drivers/md/raid5.c~current~ 2003-05-21 12:42:07.000000000 +1000
> > +++ ./drivers/md/raid5.c 2003-05-21 12:37:37.000000000 +1000
> > @@ -882,7 +882,7 @@ static void handle_stripe(struct stripe_
> >   /* check if the array has lost two devices and, if so, some requests
> might
> >   * need to be failed
> >   */
> > - if (failed > 1 && to_read+to_write) {
> > + if (failed > 1 && to_read+to_write+written) {
> >   for (i=disks; i--; ) {
> >   /* fail all writes first */
> >   if (sh->bh_write[i]) to_write--;
> > @@ -891,6 +891,14 @@ static void handle_stripe(struct stripe_
> >   bh->b_reqnext = return_fail;
> >   return_fail = bh;
> >   }
> > + /* and fail all 'written' */
> > + if (sh->bh_written[i]) written--;
> > + while ((bh = sh->bh_written[i])) {
> > + sh->bh_written[i] = bh->b_reqnext;
> > + bh->b_reqnext = return_fail;
> > + return_fail = bh;
> > + }
> > +
> >   /* fail any reads if this device is non-operational */
> >   if (!conf->disks[i].operational) {
> >   spin_lock_irq(&conf->device_lock);
> > -
> > To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html