On Thu, Mar 10, 2022 at 7:52 PM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote: > > > On 3/9/2022 12:59 AM, Yi Zhang wrote: > > On Tue, Mar 8, 2022 at 11:51 PM Max Gurtovoy <mgurtovoy@xxxxxxxxxx> wrote: > >> Hi Yi Zhang, > >> > >> Please send the commands to repro. > >> > >> I run the following with no success to repro: > >> > >> for i in `seq 100`; do echo $i && cat /sys/kernel/debug/kmemleak && > >> echo clear > /sys/kernel/debug/kmemleak && nvme reset /dev/nvme2 && > >> sleep 5 && echo scan > /sys/kernel/debug/kmemleak ; done > > Hi Max > > Sorry, I should add more details when I report it. > > The kmemleak observed when I was reproducing the "nvme reset" timeout > > issue we discussed before[1], and the cmd I used are[2] > > > > [1] > > https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flore.kernel.org%2Flinux-nvme%2FCAHj4cs_ir917u7Up5PBfwWpZtnVLey69pXXNjFNAjbqQ5vwU0w%40mail.gmail.com%2FT%2F%23m5e6dcc434fc1925b18047c348226cfbc48ffbd14&data=04%7C01%7Cmgurtovoy%40nvidia.com%7C8cef8eb496e84d35f52308da01575419%7C43083d15727340c1b7db39efd9ccc17a%7C0%7C0%7C637823771831899724%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=kjMvRAWlBe1ym3FDQO1rdZ9%2FwtKQpscvXRG48aTt3L0%3D&reserved=0 > > [2] > > # nvme connect to target > > # nvme reset /dev/nvme0 > > # nvme disconnect-all > > # sleep 10 > > # echo scan > /sys/kernel/debug/kmemleak > > # sleep 60 > > # cat /sys/kernel/debug/kmemleak > > > Thanks I was able to repro it with the above commands. > > Still not clear where is the leak is, but I do see some non-symmetric > code in the error flows that we need to fix. Plus the keep-alive timing > movement. > > It will take some time for me to debug this. > > Can you repro it with tcp transport as well ? Yes, nvme/tcp also can reproduce it, here is the log: unreferenced object 0xffff8881675f7000 (size 192): comm "nvme", pid 3711, jiffies 4296033311 (age 2272.976s) hex dump (first 32 bytes): 20 59 04 92 ff ff ff ff 00 00 da 13 81 88 ff ff Y.............. 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220 [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380 [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610 [<000000002653e58d>] blk_alloc_queue+0x400/0x840 [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100 [<00000000486936b6>] nvme_tcp_setup_ctrl+0x70c/0xbe0 [nvme_tcp] [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp] [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics] [<0000000056b79a25>] vfs_write+0x17e/0x9a0 [<00000000a5af6c18>] ksys_write+0xf1/0x1c0 [<00000000c035c128>] do_syscall_64+0x3a/0x80 [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae unreferenced object 0xffff8881675f7600 (size 192): comm "nvme", pid 3711, jiffies 4296033320 (age 2272.967s) hex dump (first 32 bytes): 20 59 04 92 ff ff ff ff 00 00 22 92 81 88 ff ff Y........"..... 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220 [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380 [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610 [<000000002653e58d>] blk_alloc_queue+0x400/0x840 [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100 [<000000006ca5f9f6>] nvme_tcp_setup_ctrl+0x772/0xbe0 [nvme_tcp] [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp] [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics] [<0000000056b79a25>] vfs_write+0x17e/0x9a0 [<00000000a5af6c18>] ksys_write+0xf1/0x1c0 [<00000000c035c128>] do_syscall_64+0x3a/0x80 [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae unreferenced object 0xffff8891fb6a3600 (size 192): comm "nvme", pid 3711, jiffies 4296033511 (age 2272.776s) hex dump (first 32 bytes): 20 59 04 92 ff ff ff ff 00 00 5c 1d 81 88 ff ff Y........\..... 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace: [<00000000adbc7c81>] kmem_cache_alloc_trace+0x10e/0x220 [<00000000c04d85be>] blk_iolatency_init+0x4e/0x380 [<00000000897ffdaf>] blkcg_init_queue+0x12e/0x610 [<000000002653e58d>] blk_alloc_queue+0x400/0x840 [<00000000fcb99f3c>] blk_mq_init_queue_data+0x6a/0x100 [<000000004a3bf20e>] nvme_tcp_setup_ctrl.cold.57+0x868/0xa5d [nvme_tcp] [<000000000bb29b26>] nvme_tcp_create_ctrl+0x953/0xbb4 [nvme_tcp] [<00000000ca3d4e54>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics] [<0000000056b79a25>] vfs_write+0x17e/0x9a0 [<00000000a5af6c18>] ksys_write+0xf1/0x1c0 [<00000000c035c128>] do_syscall_64+0x3a/0x80 [<000000000e5ea863>] entry_SYSCALL_64_after_hwframe+0x44/0xae > > maybe add some debug prints to catch the exact flow it happens ? > > >> -Max. > >> > >> On 2/21/2022 1:37 PM, Yi Zhang wrote: > >>> Hello > >>> > >>> Below kmemleak triggered when I do nvme connect/reset/disconnect > >>> operations on latest 5.17.0-rc5, pls check it. > >>> > >>> # cat /sys/kernel/debug/kmemleak > >>> unreferenced object 0xffff8883e398bc00 (size 192): > >>> comm "nvme", pid 2632, jiffies 4295317772 (age 2951.476s) > >>> hex dump (first 32 bytes): > >>> 80 50 84 a3 ff ff ff ff 70 d4 12 67 81 88 ff ff .P......p..g.... > >>> 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > >>> backtrace: > >>> [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220 > >>> [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380 > >>> [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610 > >>> [<00000000aade682c>] blk_alloc_queue+0x400/0x840 > >>> [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100 > >>> [<00000000cbff6d39>] nvme_rdma_setup_ctrl+0x4ca/0x15f0 [nvme_rdma] > >>> [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma] > >>> [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics] > >>> [<0000000031d8624b>] vfs_write+0x17e/0x9a0 > >>> [<00000000471d7945>] ksys_write+0xf1/0x1c0 > >>> [<00000000a963bc79>] do_syscall_64+0x3a/0x80 > >>> [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae > >>> unreferenced object 0xffff8883e398a700 (size 192): > >>> comm "nvme", pid 2632, jiffies 4295317782 (age 2951.466s) > >>> hex dump (first 32 bytes): > >>> 80 50 84 a3 ff ff ff ff 60 c8 12 67 81 88 ff ff .P......`..g.... > >>> 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > >>> backtrace: > >>> [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220 > >>> [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380 > >>> [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610 > >>> [<00000000aade682c>] blk_alloc_queue+0x400/0x840 > >>> [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100 > >>> [<000000004f80b965>] nvme_rdma_setup_ctrl+0xf37/0x15f0 [nvme_rdma] > >>> [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma] > >>> [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics] > >>> [<0000000031d8624b>] vfs_write+0x17e/0x9a0 > >>> [<00000000471d7945>] ksys_write+0xf1/0x1c0 > >>> [<00000000a963bc79>] do_syscall_64+0x3a/0x80 > >>> [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae > >>> unreferenced object 0xffff8894253d9d00 (size 192): > >>> comm "nvme", pid 2632, jiffies 4295331915 (age 2937.333s) > >>> hex dump (first 32 bytes): > >>> 80 50 84 a3 ff ff ff ff 80 e0 12 67 81 88 ff ff .P.........g.... > >>> 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ > >>> backtrace: > >>> [<00000000ecf84f29>] kmem_cache_alloc_trace+0x10e/0x220 > >>> [<0000000099bbcbaa>] blk_iolatency_init+0x4e/0x380 > >>> [<00000000e7a59176>] blkcg_init_queue+0x12e/0x610 > >>> [<00000000aade682c>] blk_alloc_queue+0x400/0x840 > >>> [<000000007ed43824>] blk_mq_init_queue_data+0x6a/0x100 > >>> [<000000009f9abba5>] nvme_rdma_setup_ctrl.cold.70+0x5ee/0xb01 [nvme_rdma] > >>> [<00000000a309d26c>] nvme_rdma_create_ctrl+0x7e5/0xa9f [nvme_rdma] > >>> [<000000007d8b5cca>] nvmf_dev_write+0x44e/0xa39 [nvme_fabrics] > >>> [<0000000031d8624b>] vfs_write+0x17e/0x9a0 > >>> [<00000000471d7945>] ksys_write+0xf1/0x1c0 > >>> [<00000000a963bc79>] do_syscall_64+0x3a/0x80 > >>> [<0000000005154fc2>] entry_SYSCALL_64_after_hwframe+0x44/0xae > >>> > >>> > >>> > > > -- Best Regards, Yi Zhang