Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing

Yi Zhang <yi.zhang@xxxxxxxxxx> · Mon, 21 Mar 2022 10:06:56 +0800

On Sun, Mar 20, 2022 at 9:00 PM Sagi Grimberg <sagi@xxxxxxxxxxx> wrote:
>
>
> >>> # nvme connect to target
> >>> # nvme reset /dev/nvme0
> >>> # nvme disconnect-all
> >>> # sleep 10
> >>> # echo scan > /sys/kernel/debug/kmemleak
> >>> # sleep 60
> >>> # cat /sys/kernel/debug/kmemleak
> >>>
> >> Thanks I was able to repro it with the above commands.
> >>
> >> Still not clear where is the leak is, but I do see some non-symmetric
> >> code in the error flows that we need to fix. Plus the keep-alive timing
> >> movement.
> >>
> >> It will take some time for me to debug this.
> >>
> >> Can you repro it with tcp transport as well ?
> >
> > Yes, nvme/tcp also can reproduce it, here is the log:
> >
> > unreferenced object 0xffff8881675f7000 (size 192):
> >    comm "nvme", pid 3711, jiffies 4296033311 (age 2272.976s)
> >    hex dump (first 32 bytes):
> >      20 59 04 92 ff ff ff ff 00 00 da 13 81 88 ff ff   Y..............
> >      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >    backtrace:
> >      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
> >      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
> >      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
> >      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
> >      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
> >      [<00000000486936b6>] nvme_tcp_setup_ctrl+0x70c/0xbe0 [nvme_tcp]
> >      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
> >      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
> >      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
> >      [<00000000c035c128>] do_syscall_64+0x3a/0x80
> >      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > unreferenced object 0xffff8881675f7600 (size 192):
> >    comm "nvme", pid 3711, jiffies 4296033320 (age 2272.967s)
> >    hex dump (first 32 bytes):
> >      20 59 04 92 ff ff ff ff 00 00 22 92 81 88 ff ff   Y........".....
> >      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >    backtrace:
> >      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
> >      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
> >      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
> >      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
> >      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
> >      [<000000006ca5f9f6>] nvme_tcp_setup_ctrl+0x772/0xbe0 [nvme_tcp]
> >      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
> >      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
> >      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
> >      [<00000000c035c128>] do_syscall_64+0x3a/0x80
> >      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
> > unreferenced object 0xffff8891fb6a3600 (size 192):
> >    comm "nvme", pid 3711, jiffies 4296033511 (age 2272.776s)
> >    hex dump (first 32 bytes):
> >      20 59 04 92 ff ff ff ff 00 00 5c 1d 81 88 ff ff   Y........\.....
> >      01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
> >    backtrace:
> >      [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
> >      [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
> >      [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
> >      [<000000002653e58d>] blk_alloc_queue+0x400/0x840
> >      [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
> >      [<000000004a3bf20e>] nvme_tcp_setup_ctrl.cold.57+0x868/0xa5d [nvme_tcp]
> >      [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
> >      [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
> >      [<0000000056b79a25>] vfs_write+0x17e/0x9a0
> >      [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
> >      [<00000000c035c128>] do_syscall_64+0x3a/0x80
> >      [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
>
> Looks like there is some asymmetry on blk_iolatency. It is intialized
> when allocating a request queue and exited when deleting a genhd. In
> nvme we have request queues that will never have genhd that corresponds
> to them (like the admin queue).
>
> Does this patch eliminate the issue?

Yes, the nvme/rdma nvme/tcp kmemleak fixed with the change.

> --
> diff --git a/block/blk-core.c b/block/blk-core.c
> index 94bf37f8e61d..6ccc02a41f25 100644
> --- a/block/blk-core.c
> +++ b/block/blk-core.c
> @@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)
>
>          blk_queue_flag_set(QUEUE_FLAG_DEAD, q);
>
> +       rq_qos_exit(q);
>          blk_sync_queue(q);
>          if (queue_is_mq(q)) {
>                  blk_mq_cancel_work_sync(q);
> diff --git a/block/genhd.c b/block/genhd.c
> index 54f60ded2ee6..10ff0606c100 100644
> --- a/block/genhd.c
> +++ b/block/genhd.c
> @@ -626,7 +626,6 @@ void del_gendisk(struct gendisk *disk)
>
>          blk_mq_freeze_queue_wait(q);
>
> -       rq_qos_exit(q);
>          blk_sync_queue(q);
>          blk_flush_integrity();
>          /*
> --
>

-- 
Best Regards,
  Yi Zhang