Re: [bug report] NVMe/IB: kmemleak observed on 5.17.0-rc5 with nvme-rdma testing

Sagi Grimberg <sagi@xxxxxxxxxxx> · Sun, 20 Mar 2022 15:00:24 +0200

# nvme connect to target
# nvme reset /dev/nvme0
# nvme disconnect-all
# sleep 10
# echo scan > /sys/kernel/debug/kmemleak
# sleep 60
# cat /sys/kernel/debug/kmemleak

Thanks I was able to repro it with the above commands.

Still not clear where is the leak is, but I do see some non-symmetric
code in the error flows that we need to fix. Plus the keep-alive timing
movement.

It will take some time for me to debug this.

Can you repro it with tcp transport as well ?

Yes, nvme/tcp also can reproduce it, here is the log:

unreferenced object 0xffff8881675f7000 (size 192):
   comm "nvme", pid 3711, jiffies 4296033311 (age 2272.976s)
   hex dump (first 32 bytes):
     20 59 04 92 ff ff ff ff 00 00 da 13 81 88 ff ff   Y..............
     01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   backtrace:
     [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
     [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
     [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
     [<000000002653e58d>] blk_alloc_queue+0x400/0x840
     [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
     [<00000000486936b6>] nvme_tcp_setup_ctrl+0x70c/0xbe0 [nvme_tcp]
     [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
     [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
     [<0000000056b79a25>] vfs_write+0x17e/0x9a0
     [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
     [<00000000c035c128>] do_syscall_64+0x3a/0x80
     [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
unreferenced object 0xffff8881675f7600 (size 192):
   comm "nvme", pid 3711, jiffies 4296033320 (age 2272.967s)
   hex dump (first 32 bytes):
     20 59 04 92 ff ff ff ff 00 00 22 92 81 88 ff ff   Y........".....
     01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   backtrace:
     [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
     [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
     [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
     [<000000002653e58d>] blk_alloc_queue+0x400/0x840
     [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
     [<000000006ca5f9f6>] nvme_tcp_setup_ctrl+0x772/0xbe0 [nvme_tcp]
     [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
     [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
     [<0000000056b79a25>] vfs_write+0x17e/0x9a0
     [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
     [<00000000c035c128>] do_syscall_64+0x3a/0x80
     [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae
unreferenced object 0xffff8891fb6a3600 (size 192):
   comm "nvme", pid 3711, jiffies 4296033511 (age 2272.776s)
   hex dump (first 32 bytes):
     20 59 04 92 ff ff ff ff 00 00 5c 1d 81 88 ff ff   Y........\.....
     01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
   backtrace:
     [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220
     [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380
     [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610
     [<000000002653e58d>] blk_alloc_queue+0x400/0x840
     [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100
     [<000000004a3bf20e>] nvme_tcp_setup_ctrl.cold.57+0x868/0xa5d [nvme_tcp]
     [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp]
     [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics]
     [<0000000056b79a25>] vfs_write+0x17e/0x9a0
     [<00000000a5af6c18>] ksys_write+0xf1/0x1c0
     [<00000000c035c128>] do_syscall_64+0x3a/0x80
     [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae

Looks like there is some asymmetry on blk_iolatency. It is intialized
when allocating a request queue and exited when deleting a genhd. In
nvme we have request queues that will never have genhd that corresponds
to them (like the admin queue).

Does this patch eliminate the issue?
--

diff --git a/block/blk-core.c b/block/blk-core.c
index 94bf37f8e61d..6ccc02a41f25 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -323,6 +323,7 @@ void blk_cleanup_queue(struct request_queue *q)

        blk_queue_flag_set(QUEUE_FLAG_DEAD, q);

+       rq_qos_exit(q);
        blk_sync_queue(q);
        if (queue_is_mq(q)) {
                blk_mq_cancel_work_sync(q);
diff --git a/block/genhd.c b/block/genhd.c
index 54f60ded2ee6..10ff0606c100 100644
--- a/block/genhd.c
+++ b/block/genhd.c
@@ -626,7 +626,6 @@ void del_gendisk(struct gendisk *disk)

        blk_mq_freeze_queue_wait(q);

-       rq_qos_exit(q);
        blk_sync_queue(q);
        blk_flush_integrity();
        /*
--