On 02/08/10 12:58 +1000, Neil Brown wrote: > On Mon, 2 Aug 2010 12:29:49 +1000 > Neil Brown <neilb@xxxxxxx> wrote: > > > > Ahhhh.... I see the problem. Because a 'generic_make_request' is already > > active, the once called by raid10::make_request just queues the request until > > the top level one completes. This results in a deadlock. > > > > I'll have to ponder a bit to figure out the best way to fix this. > > > > So, one good strong cup of tea later I think I have a good solution. > > Would you be able to test with this patch and confirm that you cannot > reproduce the hang? I've been running with this patch on 2.6.34.1 all day and have yet to cause the hang. Given it took under 5 minutes earlier, feel free to add: Tested-by: Justin Bronder <jsbronder@xxxxxxxxxx> I really appreciate you taking care of this. Thanks. > Thanks. > > NeilBrown > > diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c > index 42e64e4..d1d6891 100644 > --- a/drivers/md/raid10.c > +++ b/drivers/md/raid10.c > @@ -825,11 +825,29 @@ static int make_request(mddev_t *mddev, struct bio * bio) > */ > bp = bio_split(bio, > chunk_sects - (bio->bi_sector & (chunk_sects - 1)) ); > + > + /* Each of these 'make_request' calls will call 'wait_barrier'. > + * If the first succeeds but the second blocks due to the resync > + * thread raising the barrier, we will deadlock because the > + * IO to the underlying device will be queued in generic_make_request > + * and will never complete, so will never reduce nr_pending. > + * So increment nr_waiting here so no new raise_barriers will > + * succeed, and so the second wait_barrier cannot block. > + */ > + spin_lock_irq(&conf->resync_lock); > + conf->nr_waiting++; > + spin_unlock_irq(&conf->resync_lock); > + > if (make_request(mddev, &bp->bio1)) > generic_make_request(&bp->bio1); > if (make_request(mddev, &bp->bio2)) > generic_make_request(&bp->bio2); > > + spin_lock_irq(&conf->resync_lock); > + conf->nr_waiting--; > + wake_up(&conf->wait_barrier); > + spin_unlock_irq(&conf->resync_lock); > + > bio_pair_release(bp); > return 0; > bad_map: > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- Justin Bronder
Attachment:
pgpeYh7Z7bKWk.pgp
Description: PGP signature