On Thu, Jul 18 2019 at 9:35pm -0400, Ming Lei <ming.lei@xxxxxxxxxx> wrote: > On Thu, Jul 18, 2019 at 10:52:01AM -0400, Mike Snitzer wrote: > > On Wed, Jul 17 2019 at 11:25pm -0400, > > Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > > > > dm-rq needs to free request which has been dispatched and not completed > > > by underlying queue. However, the underlying queue may have allocated > > > private stuff for this request in .queue_rq(), so dm-rq will leak the > > > request private part. > > > > No, SCSI (and blk-mq) will leak. DM doesn't know anything about the > > internal memory SCSI uses. That memory is a SCSI implementation detail. > > It isn't noting to do with dm-rq, which frees one request after BLK_STS_*RESOURCE > is returned from blk_insert_cloned_request(), in this case it has to be > the user for releasing the request private data. > > > > > Please fix header to properly reflect which layer is doing the leaking. > > Fine. > > > > > > Add one new callback of .cleanup_rq() to fix the memory leak issue. > > > > > > Another use case is to free request when the hctx is dead during > > > cpu hotplug context. > > > > > > Cc: Ewan D. Milne <emilne@xxxxxxxxxx> > > > Cc: Bart Van Assche <bvanassche@xxxxxxx> > > > Cc: Hannes Reinecke <hare@xxxxxxxx> > > > Cc: Christoph Hellwig <hch@xxxxxx> > > > Cc: Mike Snitzer <snitzer@xxxxxxxxxx> > > > Cc: dm-devel@xxxxxxxxxx > > > Cc: <stable@xxxxxxxxxxxxxxx> > > > Fixes: 396eaf21ee17 ("blk-mq: improve DM's blk-mq IO merging via blk_insert_cloned_request feedback") > > > Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx> > > > --- > > > drivers/md/dm-rq.c | 1 + > > > include/linux/blk-mq.h | 13 +++++++++++++ > > > 2 files changed, 14 insertions(+) > > > > > > diff --git a/drivers/md/dm-rq.c b/drivers/md/dm-rq.c > > > index c9e44ac1f9a6..21d5c1784d0c 100644 > > > --- a/drivers/md/dm-rq.c > > > +++ b/drivers/md/dm-rq.c > > > @@ -408,6 +408,7 @@ static int map_request(struct dm_rq_target_io *tio) > > > ret = dm_dispatch_clone_request(clone, rq); > > > if (ret == BLK_STS_RESOURCE || ret == BLK_STS_DEV_RESOURCE) { > > > blk_rq_unprep_clone(clone); > > > + blk_mq_cleanup_rq(clone); > > > tio->ti->type->release_clone_rq(clone, &tio->info); > > > tio->clone = NULL; > > > return DM_MAPIO_REQUEUE; > > > > Requiring upper layer driver (dm-rq) to explicitly call blk_mq_cleanup_rq() > > seems wrong. In this instance tio->ti->type->release_clone_rq() > > (dm-mpath's multipath_release_clone) calls blk_put_request(). Why can't > > blk_put_request(), or blk_mq_free_request(), call blk_mq_cleanup_rq()? > > I did think about doing it in blk_put_request(), and I just want to > avoid the little cost in generic fast path, given freeing request after > dispatch is very unusual, so far only nvme multipath and dm-rq did in > that way. > > However, if no one objects to move blk_mq_cleanup_rq() to blk_put_request() > or blk_mq_free_request(), I am fine to do that in V2. Think it'd be a less fragile/nuanced way to extend the blk-mq interface. Otherwise there is potential for other future drivers experiencing leaks. > > Not looked at the cpu hotplug case you mention, but my naive thought is > > it'd be pretty weird to also sprinkle a call to blk_mq_cleanup_rq() from > > that specific "dead hctx" code path. > > It isn't weird, and it is exactly what NVMe multipath is doing, please see > nvme_failover_req(). And it is just that nvme doesn't allocate request > private data. > > Wrt. blk-mq cpu hotplug handling: after one hctx is dead, we can't dispatch > request to this hctx any more, however one request has been bounded to its > hctx since its allocation and the association can't(or quite hard to) be > changed any more, do you have any better idea to deal with this issue? No, as I prefaced before "Not looked at the cpu hotplug case you mention". As such I should've stayed silent ;) But my point was we should hook off current interfaces rather than rely on a new primary function call. Mike