Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing

Ming Lei <ming.lei@xxxxxxxxxx> · Tue, 22 Mar 2022 15:36:14 +0800

On Tue, Mar 22, 2022 at 12:58 PM Yi Zhang <yi.zhang@xxxxxxxxxx> wrote:
>
> On Mon, Mar 21, 2022 at 5:25 PM Sagi Grimberg <sagi@xxxxxxxxxxx> wrote:
> >
> >
> > >>>>> # nvme connect to target
> > >>>>> # nvme reset /dev/nvme0
> > >>>>> # nvme disconnect-all
> > >>>>> # sleep 10
> > >>>>> # echo scan > /sys/kernel/debug/kmemleak
> > >>>>> # sleep 60
> > >>>>> # cat /sys/kernel/debug/kmemleak
> > >>>>>
> > >>>> Thanks I was able to repro it with the above commands.
> > >>>>
> > >>>> Still not clear where is the leak is, but I do see some non-symmetric
> > >>>> code in the error flows that we need to fix. Plus the keep-alive timing
> > >>>> movement.
> > >>>>
> > >>>> It will take some time for me to debug this.
> > >>>>
> > >>>> Can you repro it with tcp transport as well ?
> > >>>
> > >>> Yes, nvme/tcp also can reproduce it, here is the log:
> >
> > Looks like the offending commit was 8e141f9eb803 ("block: drain file
> > system I/O on del_gendisk") which moved the call-site for a reason.
> >
> > However rq_qos_exit() should be reentrant safe, so can you verify
> > that this change eliminates the issue as well?
>
> Yes, this change also fixed the kmemleak, thanks.
>
> > --
> > diff --git a/block/blk-core.c b/block/blk-core.c
> > index 94bf37f8e61d..6ccc02a41f25 100644
> > --- a/block/blk-core.c
> > +++ b/block/blk-core.c
> > @@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)
> >
> >          blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
> >
> > +       rq_qos_exit(q);
> >          blk_sync_queue(q);
> >          if (queue_is_mq(q)) {
> >                  blk_mq_cancel_work_sync(q);

BTW,  the similar fix has been merged to v5.17:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=daaca3522a8e67c46e39ef09c1d542e866f85f3b

Thanks,