On Tue, Mar 22, 2022 at 3:36 PM Ming Lei <ming.lei@xxxxxxxxxx> wrote: > > On Tue, Mar 22, 2022 at 12:58 PM Yi Zhang <yi.zhang@xxxxxxxxxx> wrote: > > > > On Mon, Mar 21, 2022 at 5:25 PM Sagi Grimberg <sagi@xxxxxxxxxxx> wrote: > > > > > > > > > >>>>> # nvme connect to target > > > >>>>> # nvme reset /dev/nvme0 > > > >>>>> # nvme disconnect-all > > > >>>>> # sleep 10 > > > >>>>> # echo scan > /sys/kernel/debug/kmemleak > > > >>>>> # sleep 60 > > > >>>>> # cat /sys/kernel/debug/kmemleak > > > >>>>> > > > >>>> Thanks I was able to repro it with the above commands. > > > >>>> > > > >>>> Still not clear where is the leak is, but I do see some non-symmetric > > > >>>> code in the error flows that we need to fix. Plus the keep-alive timing > > > >>>> movement. > > > >>>> > > > >>>> It will take some time for me to debug this. > > > >>>> > > > >>>> Can you repro it with tcp transport as well ? > > > >>> > > > >>> Yes, nvme/tcp also can reproduce it, here is the log: > > > > > > Looks like the offending commit was 8e141f9eb803 ("block: drain file > > > system I/O on del_gendisk") which moved the call-site for a reason. > > > > > > However rq_qos_exit() should be reentrant safe, so can you verify > > > that this change eliminates the issue as well? > > > > Yes, this change also fixed the kmemleak, thanks. > > > > > -- > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > index 94bf37f8e61d..6ccc02a41f25 100644 > > > --- a/block/blk-core.c > > > +++ b/block/blk-core.c > > > @@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q) > > > > > > blk_queue_flag_set(QUEUE_FLAG_DEAD, q); > > > > > > + rq_qos_exit(q); > > > blk_sync_queue(q); > > > if (queue_is_mq(q)) { > > > blk_mq_cancel_work_sync(q); > > BTW, the similar fix has been merged to v5.17: Thanks Ming, confirmed the kmemleak was fixed on v5.17 > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=daaca3522a8e67c46e39ef09c1d542e866f85f3b > > Thanks, > -- Best Regards, Yi Zhang