Hi Hannes, Thank you for the comments and patches. See my comments below. On 01/29/2009 07:41 PM +0900, Hannes Reinecke wrote: > On Thu, Jan 29, 2009 at 04:18:59PM +0900, Kiyoshi Ueda wrote: >> Hi Alasdair, >> >> On 01/29/2009 12:40 AM +0900, Alasdair G Kergon wrote: >>> On Fri, Oct 03, 2008 at 11:08:25AM -0400, Kiyoshi Ueda wrote: >>>> This patch-set is the updated version of request-based dm-multipath. >>>> The changes from the previous version (*) are to follow up the change >>>> of the interface to export lld's busy state (PATCH 5). >>> I've fixed them up to compile again, but haven't thoroughly checked for >>> side-effects. >> I found some problems below in my patches and now considering >> how to fix them: >> o Unnecessary EIO is returned to application if it submits >> a bio during table swapping. > Yes, I've noticed that. Problem is this: > > --- linux-2.6.27.orig/drivers/md/dm.c > +++ linux-2.6.27/drivers/md/dm.c > @@ -1304,7 +1304,11 @@ static int dm_make_request(struct reques > return 0; > } > > - if (unlikely(!md->map)) { > + /* > + * Submitting to a stopped queue with no map is okay; > + * might happen during reconfiguration. > + */ > + if (unlikely(!md->map) && !blk_queue_stopped(q)) { > bio_endio(bio, -EIO); > return 0; > } > > The make_request callback should never return EIO if there's any > chance at all to get this request done. Exactly this part has a race condition with table swapping. Although you patch fixes the race condition, I think I need more careful consideration here. (e.g. Is it really OK to go bios through __make_request() with old queue restrictions during table swapping?) I'll work on this problem after my vacation. >> o kernel panic occurs by frequent table swapping during heavy I/Os. >> > That's probably fixed by this patch: > > --- linux-2.6.27/drivers/md/dm.c.orig 2009-01-23 15:59:22.741461315 +0100 > +++ linux-2.6.27/drivers/md/dm.c 2009-01-26 09:03:02.787605723 +0100 > @@ -714,13 +714,14 @@ static void free_bio_clone(struct reques > struct dm_rq_target_io *tio = clone->end_io_data; > struct mapped_device *md = tio->md; > struct bio *bio; > - struct dm_clone_bio_info *info; > > while ((bio = clone->bio) != NULL) { > clone->bio = bio->bi_next; > > - info = bio->bi_private; > - free_bio_info(md, info); > + if (bio->bi_private) { > + struct dm_clone_bio_info *info = bio->bi_private; > + free_bio_info(md, info); > + } > > bio->bi_private = md->bs; > bio_put(bio); > > The info field is not necessarily filled here, so we have to check for it > explicitly. > > With these two patches request-based multipathing have survived all stress-tests > so far. Except on mainframe (zfcp), but that's more a driver-related thing. Hmm, it seems I'm hitting a different problem from the one you hit, because I still see my problem even with your patch. I need more investigation after my vacation. Thanks, Kiyoshi Ueda -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel