On Tue, Mar 22, 2022 at 12:58 PM Yi Zhang <yi.zhang@xxxxxxxxxx> wrote: > > On Mon, Mar 21, 2022 at 5:25 PM Sagi Grimberg <sagi@xxxxxxxxxxx> wrote: > > > > > > >>>>> # nvme connect to target > > >>>>> # nvme reset /dev/nvme0 > > >>>>> # nvme disconnect-all > > >>>>> # sleep 10 > > >>>>> # echo scan > /sys/kernel/debug/kmemleak > > >>>>> # sleep 60 > > >>>>> # cat /sys/kernel/debug/kmemleak > > >>>>> > > >>>> Thanks I was able to repro it with the above commands. > > >>>> > > >>>> Still not clear where is the leak is, but I do see some non-symmetric > > >>>> code in the error flows that we need to fix. Plus the keep-alive timing > > >>>> movement. > > >>>> > > >>>> It will take some time for me to debug this. > > >>>> > > >>>> Can you repro it with tcp transport as well ? > > >>> > > >>> Yes, nvme/tcp also can reproduce it, here is the log: > > > > Looks like the offending commit was 8e141f9eb803 ("block: drain file > > system I/O on del_gendisk") which moved the call-site for a reason. > > > > However rq_qos_exit() should be reentrant safe, so can you verify > > that this change eliminates the issue as well? > > Yes, this change also fixed the kmemleak, thanks. > > > -- > > diff --git a/block/blk-core.c b/block/blk-core.c > > index 94bf37f8e61d..6ccc02a41f25 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q) > > > > blk_queue_flag_set(QUEUE_FLAG_DEAD, q); > > > > + rq_qos_exit(q); > > blk_sync_queue(q); > > if (queue_is_mq(q)) { > > blk_mq_cancel_work_sync(q); BTW, the similar fix has been merged to v5.17: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=daaca3522a8e67c46e39ef09c1d542e866f85f3b Thanks,