On Mon, Mar 21, 2022 at 5:25 PM Sagi Grimberg <sagi@xxxxxxxxxxx> wrote: > > > >>>>> # nvme connect to target > >>>>> # nvme reset /dev/nvme0 > >>>>> # nvme disconnect-all > >>>>> # sleep 10 > >>>>> # echo scan > /sys/kernel/debug/kmemleak > >>>>> # sleep 60 > >>>>> # cat /sys/kernel/debug/kmemleak > >>>>> > >>>> Thanks I was able to repro it with the above commands. > >>>> > >>>> Still not clear where is the leak is, but I do see some non-symmetric > >>>> code in the error flows that we need to fix. Plus the keep-alive timing > >>>> movement. > >>>> > >>>> It will take some time for me to debug this. > >>>> > >>>> Can you repro it with tcp transport as well ? > >>> > >>> Yes, nvme/tcp also can reproduce it, here is the log: > > Looks like the offending commit was 8e141f9eb803 ("block: drain file > system I/O on del_gendisk") which moved the call-site for a reason. > > However rq_qos_exit() should be reentrant safe, so can you verify > that this change eliminates the issue as well? Yes, this change also fixed the kmemleak, thanks. > -- > diff --git a/block/blk-core.c b/block/blk-core.c > index 94bf37f8e61d..6ccc02a41f25 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q) > > blk_queue_flag_set(QUEUE_FLAG_DEAD, q); > > + rq_qos_exit(q); > blk_sync_queue(q); > if (queue_is_mq(q)) { > blk_mq_cancel_work_sync(q); > -- > -- Best Regards, Yi Zhang