Re: [bug report] kmemleak in rdma_core observed during blktests nvme/rdma use siw

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 26, 2024 at 1:56 PM Yi Zhang <yi.zhang@xxxxxxxxxx> wrote:
>
> On Wed, Apr 24, 2024 at 9:28 PM Guoqing Jiang <guoqing.jiang@xxxxxxxxx> wrote:
> >
> > Hi,
> >
> > On 4/8/24 14:03, Yi Zhang wrote:
> > > Hi
> > > I found the below kmemleak issue during blktests nvme/rdma on the
> > > latest linux-rdma/for-next, please help check it and let me know if
> > > you need any info/testing for it, thanks.
> >
> > Could you share which test case caused the issue? I can't reproduce
> > it with 6.9-rc3+ kernel (commit 586b5dfb51b) with the below.
>
> It can be reproduced by [1], you can find more info from the symbol
> info[2], I also attached the config file, maybe you can this config
> file

Just attached the config file

>
> [1] nvme_trtype=rdma ./check nvme/012
> [2]
> unreferenced object 0xffff8883a87e8800 (size 192):
>   comm "rdma", pid 2355, jiffies 4294836069
>   hex dump (first 32 bytes):
>     32 00 00 00 00 00 00 00 c0 ff ff ff 1f 00 00 00  2...............
>     10 88 7e a8 83 88 ff ff 10 88 7e a8 83 88 ff ff  ..~.......~.....
>   backtrace (crc 4db191c4):
>     [<ffffffff8cd251bd>] kmalloc_trace+0x30d/0x3b0
>     [<ffffffffc207eff7>] alloc_gid_entry+0x47/0x380 [ib_core]
>     [<ffffffffc2080206>] add_modify_gid+0x166/0x930 [ib_core]
>     [<ffffffffc2081468>] ib_cache_update.part.0+0x6d8/0x910 [ib_core]
>     [<ffffffffc2082e1a>] ib_cache_setup_one+0x24a/0x350 [ib_core]
>     [<ffffffffc207749e>] ib_register_device+0x9e/0x3a0 [ib_core]
>     [<ffffffffc24ac389>] 0xffffffffc24ac389
>     [<ffffffffc20c6cd8>] nldev_newlink+0x2b8/0x520 [ib_core]
>     [<ffffffffc2083fe3>] rdma_nl_rcv_msg+0x2c3/0x520 [ib_core]
>     [<ffffffffc208448c>]
> rdma_nl_rcv_skb.constprop.0.isra.0+0x23c/0x3a0 [ib_core]
>     [<ffffffff8e30e715>] netlink_unicast+0x445/0x710
>     [<ffffffff8e30f151>] netlink_sendmsg+0x761/0xc40
>     [<ffffffff8e09da89>] __sys_sendto+0x3a9/0x420
>     [<ffffffff8e09dbec>] __x64_sys_sendto+0xdc/0x1b0
>     [<ffffffff8e9afad3>] do_syscall_64+0x93/0x180
>     [<ffffffff8ea00126>] entry_SYSCALL_64_after_hwframe+0x71/0x79
>
> (gdb) l *(alloc_gid_entry+0x47)
> 0x2eff7 is in alloc_gid_entry (./include/linux/slab.h:628).
> 623
> 624 if (size > KMALLOC_MAX_CACHE_SIZE)
> 625 return kmalloc_large(size, flags);
> 626
> 627 index = kmalloc_index(size);
> 628 return kmalloc_trace(
> 629 kmalloc_caches[kmalloc_type(flags, _RET_IP_)][index],
> 630 flags, size);
> 631 }
> 632 return __kmalloc(size, flags);
>
> (gdb) l *(add_modify_gid+0x166)
> 0x30206 is in add_modify_gid (drivers/infiniband/core/cache.c:447).
> 442 * empty table entries instead of storing them.
> 443 */
> 444 if (rdma_is_zero_gid(&attr->gid))
> 445 return 0;
> 446
> 447 entry = alloc_gid_entry(attr);
> 448 if (!entry)
> 449 return -ENOMEM;
> 450
> 451 if (rdma_protocol_roce(attr->device, attr->port_num)) {
>
>
> >
> > use_siw=1 nvme_trtype=rdma ./check nvme/
> >
> > > # dmesg | grep kmemleak
> > > [   67.130652] kmemleak: Kernel memory leak detector initialized (mem
> > > pool available: 36041)
> > > [   67.130728] kmemleak: Automatic memory scanning thread started
> > > [ 1051.771867] kmemleak: 2 new suspected memory leaks (see
> > > /sys/kernel/debug/kmemleak)
> > > [ 1832.796189] kmemleak: 8 new suspected memory leaks (see
> > > /sys/kernel/debug/kmemleak)
> > > [ 2578.189075] kmemleak: 17 new suspected memory leaks (see
> > > /sys/kernel/debug/kmemleak)
> > > [ 3330.710984] kmemleak: 4 new suspected memory leaks (see
> > > /sys/kernel/debug/kmemleak)
> > >
> > > unreferenced object 0xffff88855da53400 (size 192):
> > >    comm "rdma", pid 10630, jiffies 4296575922
> > >    hex dump (first 32 bytes):
> > >      37 00 00 00 00 00 00 00 c0 ff ff ff 1f 00 00 00  7...............
> > >      10 34 a5 5d 85 88 ff ff 10 34 a5 5d 85 88 ff ff  .4.].....4.]....
> > >    backtrace (crc 47f66721):
> > >      [<ffffffff911251bd>] kmalloc_trace+0x30d/0x3b0
> > >      [<ffffffffc2640ff7>] alloc_gid_entry+0x47/0x380 [ib_core]
> > >      [<ffffffffc2642206>] add_modify_gid+0x166/0x930 [ib_core]
> >
> > I guess add_modify_gid is called from config_non_roce_gid_cache, not sure
> > why we don't check the return value of it here.
> >
> > Looks put_gid_entry is called in case add_modify_gid returns failure, it
> > would
> > trigger schedule_free_gid -> queue_work(ib_wq, &entry->del_work), then
> > free_gid_work -> free_gid_entry_locked would free storage asynchronously by
> > put_gid_ndev and also entry.
> >
> > >      [<ffffffffc2643468>] ib_cache_update.part.0+0x6d8/0x910 [ib_core]
> > >      [<ffffffffc2644e1a>] ib_cache_setup_one+0x24a/0x350 [ib_core]
> > >      [<ffffffffc263949e>] ib_register_device+0x9e/0x3a0 [ib_core]
> > >      [<ffffffffc2a3d389>] 0xffffffffc2a3d389
> > >      [<ffffffffc2688cd8>] nldev_newlink+0x2b8/0x520 [ib_core]
> > >      [<ffffffffc2645fe3>] rdma_nl_rcv_msg+0x2c3/0x520 [ib_core]
> > >      [<ffffffffc264648c>]
> > > rdma_nl_rcv_skb.constprop.0.isra.0+0x23c/0x3a0 [ib_core]
> > >      [<ffffffff9270e7b5>] netlink_unicast+0x445/0x710
> > >      [<ffffffff9270f1f1>] netlink_sendmsg+0x761/0xc40
> > >      [<ffffffff9249db29>] __sys_sendto+0x3a9/0x420
> > >      [<ffffffff9249dc8c>] __x64_sys_sendto+0xdc/0x1b0
> > >      [<ffffffff92db0ad3>] do_syscall_64+0x93/0x180
> > >      [<ffffffff92e00126>] entry_SYSCALL_64_after_hwframe+0x71/0x79
> >
> > After ib_cache_setup_one failed, maybe ib_cache_cleanup_one is needed
> > which flush ib_wq to ensure storage is freed. Could you try with the change?
> Will try it later.
>

The kmemleak still can be reproduced with this change:

unreferenced object 0xffff8881f89fde00 (size 192):
  comm "rdma", pid 8708, jiffies 4295703453
  hex dump (first 32 bytes):
    02 00 00 00 00 00 00 00 c0 ff ff ff 1f 00 00 00  ................
    10 de 9f f8 81 88 ff ff 10 de 9f f8 81 88 ff ff  ................
  backtrace (crc 888c494b):
    [<ffffffffa7d251bd>] kmalloc_trace+0x30d/0x3b0
    [<ffffffffc1efeff7>] alloc_gid_entry+0x47/0x380 [ib_core]
    [<ffffffffc1f00206>] add_modify_gid+0x166/0x930 [ib_core]
    [<ffffffffc1f01468>] ib_cache_update.part.0+0x6d8/0x910 [ib_core]
    [<ffffffffc1f02e1a>] ib_cache_setup_one+0x24a/0x350 [ib_core]
    [<ffffffffc1ef749e>] ib_register_device+0x9e/0x3a0 [ib_core]
    [<ffffffffc22ee389>]
siw_qp_state_to_ib_qp_state+0x28a9/0xfffffffffffd1520 [siw]
    [<ffffffffc1f46cd8>] nldev_newlink+0x2b8/0x520 [ib_core]
    [<ffffffffc1f03fe3>] rdma_nl_rcv_msg+0x2c3/0x520 [ib_core]
    [<ffffffffc1f0448c>]
rdma_nl_rcv_skb.constprop.0.isra.0+0x23c/0x3a0 [ib_core]
    [<ffffffffa930e715>] netlink_unicast+0x445/0x710
    [<ffffffffa930f151>] netlink_sendmsg+0x761/0xc40
    [<ffffffffa909da89>] __sys_sendto+0x3a9/0x420
    [<ffffffffa909dbec>] __x64_sys_sendto+0xdc/0x1b0
    [<ffffffffa99afad3>] do_syscall_64+0x93/0x180
    [<ffffffffa9a00126>] entry_SYSCALL_64_after_hwframe+0x71/0x79

> >
> > --- a/drivers/infiniband/core/device.c
> > +++ b/drivers/infiniband/core/device.c
> > @@ -1388,7 +1388,7 @@ int ib_register_device(struct ib_device *device,
> > const char *name,
> >          if (ret) {
> >                  dev_warn(&device->dev,
> >                           "Couldn't set up InfiniBand P_Key/GID cache\n");
> > -               return ret;
> > +               goto cache_cleanup;
> >          }
> >
> > Thanks,
> > Guoqing
> >
>
>
> --
> Best Regards,
>   Yi Zhang



-- 
Best Regards,
  Yi Zhang

Attachment: config
Description: Binary data


[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux