On Mon, 2 Aug 2010 16:37:54 -0400 Justin Bronder <jsbronder@xxxxxxxxxx> wrote: > On 02/08/10 12:58 +1000, Neil Brown wrote: > > On Mon, 2 Aug 2010 12:29:49 +1000 > > Neil Brown <neilb@xxxxxxx> wrote: > > > > > > > Ahhhh.... I see the problem. Because a 'generic_make_request' is already > > > active, the once called by raid10::make_request just queues the request until > > > the top level one completes. This results in a deadlock. > > > > > > I'll have to ponder a bit to figure out the best way to fix this. > > > > > > > So, one good strong cup of tea later I think I have a good solution. > > > > Would you be able to test with this patch and confirm that you cannot > > reproduce the hang? > > I've been running with this patch on 2.6.34.1 all day and have yet to cause > the hang. Given it took under 5 minutes earlier, feel free to add: > > Tested-by: Justin Bronder <jsbronder@xxxxxxxxxx> > > I really appreciate you taking care of this. Thanks. And thank you for testing. I've queued this up now and will send it to Linus and -stable shortly. NeilBrown > > > Thanks. > > > > NeilBrown > > > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > > index 42e64e4..d1d6891 100644 > > --- a/drivers/md/raid10.c > > +++ b/drivers/md/raid10.c > > @@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio) > > */ > > bp = bio_split(bio, > > chunk_sects - (bio->bi_sector & (chunk_sects - 1)) ); > > + > > + /* Each of these 'make_request' calls will call 'wait_barrier'. > > + * If the first succeeds but the second blocks due to the resync > > + * thread raising the barrier, we will deadlock because the > > + * IO to the underlying device will be queued in generic_make_request > > + * and will never complete, so will never reduce nr_pending. > > + * So increment nr_waiting here so no new raise_barriers will > > + * succeed, and so the second wait_barrier cannot block. > > + */ > > + spin_lock_irq(&conf->resync_lock); > > + conf->nr_waiting++; > > + spin_unlock_irq(&conf->resync_lock); > > + > > if (make_request(mddev, &bp->bio1)) > > generic_make_request(&bp->bio1); > > if (make_request(mddev, &bp->bio2)) > > generic_make_request(&bp->bio2); > > > > + spin_lock_irq(&conf->resync_lock); > > + conf->nr_waiting--; > > + wake_up(&conf->wait_barrier); > > + spin_unlock_irq(&conf->resync_lock); > > + > > bio_pair_release(bp); > > return 0; > > bad_map: > > -- > > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > > the body of a message to majordomo@xxxxxxxxxxxxxxx > > More majordomo info at http://vger.kernel.org/majordomo-info.html > > > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html