On 10/31/13 03:09, Frank Mayhar wrote: > On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote: >> On Wed, Oct 30 2013 at 11:08am -0400, >> Frank Mayhar <fmayhar@xxxxxxxxxx> wrote: >> >>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote: >>>> Any interest in this or should I just table it for >= v3.14? >>> >>> Sorry, I've been busy putting out another fire. Yes, there's definitely >>> still interest. I grabbed your revised patch and tested with it. >>> Unfortunately the timeout doesn't actually fire when requests are queued >>> due to queue_if_no_path; IIRC the block request queue timeout logic >>> wasn't triggering. I planned to look into it more deeply figure out why >>> but I had to spend all last week fixing a nasty race and hadn't gotten >>> back to it yet. >> >> OK, Hannes, any idea why this might be happening? The patch in question >> is here: https://patchwork.kernel.org/patch/3070391/ > > I got to this today and so far the most interesting I see is that the > cloned request that's queued in multipath has no queue associated with > it when it's queued; a printk reveals: > > [ 517.610042] map_io: queueing rq ffff8801150e0070 q (null) > > When it's eventually dequeued, it gets a queue from the destination > device (in the pgpath) via bdev_get_queue(). > > Because of this and from just looking at the code, blk_start_request() > (and therefore blk_add_timer()) isn't being called for those requests, > so there's never a chance that the timeout would happen. > > Does this make sense? Or am I totally off-base? Hi, I haven't checked the above patch in detail but there is a problem; abort_if_no_path() treats "rq" as a clone request, which it isn't. "rq" is an original request. It shouldn't be a correct fix but just for testing purpose, you can try changing: info = dm_get_rq_mapinfo(rq); to info = dm_get_rq_mapinfo(rq->special); and see what happens. -- Jun'ichi Nomura, NEC Corporation -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel