On Thu, 2013-10-31 at 09:36 +0000, Junichi Nomura wrote: > On 10/31/13 03:09, Frank Mayhar wrote: > > On Wed, 2013-10-30 at 11:43 -0400, Mike Snitzer wrote: > >> On Wed, Oct 30 2013 at 11:08am -0400, > >> Frank Mayhar <fmayhar@xxxxxxxxxx> wrote: > >> > >>> On Tue, 2013-10-29 at 21:02 -0400, Mike Snitzer wrote: > >>>> Any interest in this or should I just table it for >= v3.14? > >>> > >>> Sorry, I've been busy putting out another fire. Yes, there's definitely > >>> still interest. I grabbed your revised patch and tested with it. > >>> Unfortunately the timeout doesn't actually fire when requests are queued > >>> due to queue_if_no_path; IIRC the block request queue timeout logic > >>> wasn't triggering. I planned to look into it more deeply figure out why > >>> but I had to spend all last week fixing a nasty race and hadn't gotten > >>> back to it yet. > >> > >> OK, Hannes, any idea why this might be happening? The patch in question > >> is here: https://patchwork.kernel.org/patch/3070391/ > > > > I got to this today and so far the most interesting I see is that the > > cloned request that's queued in multipath has no queue associated with > > it when it's queued; a printk reveals: > > > > [ 517.610042] map_io: queueing rq ffff8801150e0070 q (null) > > > > When it's eventually dequeued, it gets a queue from the destination > > device (in the pgpath) via bdev_get_queue(). > > > > Because of this and from just looking at the code, blk_start_request() > > (and therefore blk_add_timer()) isn't being called for those requests, > > so there's never a chance that the timeout would happen. > > > > Does this make sense? Or am I totally off-base? > > Hi, > > I haven't checked the above patch in detail but there is a problem; > abort_if_no_path() treats "rq" as a clone request, which it isn't. > "rq" is an original request. > > It shouldn't be a correct fix but just for testing purpose, you can try > changing: > info = dm_get_rq_mapinfo(rq); > to > info = dm_get_rq_mapinfo(rq->special); > and see what happens. Well, at the moment this is kind of moot since abort_if_no_path() isn't being called. But, regardless, don't we want to time out the clone request? That is, after all, what is being queued in map_io(). Unfortunately the clones don't appear to be associated with a request queue; they're just put on multipath's internal queue. -- Frank Mayhar 310-460-4042 -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel