> -----Original Message----- > From: Jason Gunthorpe <jgg@xxxxxxxx> > Sent: Thursday, November 22, 2018 1:42 PM > To: Leon Romanovsky <leon@xxxxxxxxxx> > Cc: Doug Ledford <dledford@xxxxxxxxxx>; Parav Pandit > <parav@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>; > Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx>; Leon Romanovsky > <leonro@xxxxxxxxxxxx> > Subject: Re: [PATCH rdma-next v1] RDMA/core: Sync unregistration with > netlink commands > > On Fri, Nov 16, 2018 at 03:50:57AM +0200, Leon Romanovsky wrote: > > From: Parav Pandit <parav@xxxxxxxxxxxx> > > > > Currently when rdma device is getting removed, get resource info can > > race with device removal example below. > > > > CPU-0 CPU-1 > > -------- -------- > > rdma_nl_rcv_msg() > > nldev_res_get_cq_dumpit() > > mutex_lock(device_lock); > > get device reference > > mutex_unlock(device_lock); [..] > > ib_unregister_device() > > /* Valid reference to > > * device->dev exists. > > */ > > ib_dealloc_device() > > > > [..] > > provider->fill_res_entry(); > > > > Even though device object is not freed, fill_res_entry() can get > > called on device which doesn't from provider driver side. > > Kernel core device reference count is not sufficient. > > > > Similar race can occur with device renaming and device removal, where > > device_rename() tries to rename a unregistered device. While this is > > fine for devices of a class which are not net namespace aware, but it > > is incorrect for net namespace aware class coming in subsequent series. > > If a class is net namespace aware, than below [1] call trace is > > observed in above situation. > > > > Therefore, to avoid the such race, keep a reference count and let > > device unregistration wait until all netlink users drop the reference. > > > > [1] Call trace: > > kernfs: ns required in 'infiniband' for 'mlx5_0' > > WARNING: CPU: 18 PID: 44270 at fs/kernfs/dir.c:842 > > kernfs_find_ns+0x104/0x120 libahci i2c_core mlxfw libata dca [last > > unloaded: devlink] > > RIP: 0010:kernfs_find_ns+0x104/0x120 > > Call Trace: > > kernfs_find_and_get_ns+0x2e/0x50 > > sysfs_rename_link_ns+0x40/0xb0 > > device_rename+0xb2/0xf0 > > ib_device_rename+0xb3/0x100 [ib_core] > > nldev_set_doit+0x165/0x190 [ib_core] > > rdma_nl_rcv_msg+0x249/0x250 [ib_core] > > ? netlink_deliver_tap+0x8f/0x3e0 > > rdma_nl_rcv+0xd6/0x120 [ib_core] > > netlink_unicast+0x17c/0x230 > > netlink_sendmsg+0x2f0/0x3e0 > > sock_sendmsg+0x30/0x40 > > __sys_sendto+0xdc/0x160 > > > > Fixes: da5c85078215 ("RDMA/nldev: add driver-specific resource > > tracking") > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > --- > > Changelog v0->v1: > > * Fixed type in comment > > * Rephrased comment > > --- > > drivers/infiniband/core/core_priv.h | 1 + > > drivers/infiniband/core/device.c | 27 +++++++++++++++++++++++---- > > drivers/infiniband/core/nldev.c | 20 ++++++++++---------- > > include/rdma/ib_verbs.h | 8 +++++++- > > 4 files changed, 41 insertions(+), 15 deletions(-) > > > > -- > > 2.19.1 > > This patch also doesn't apply, but the fix seemed simple enough. Again > please check. I also reworded some of the comments. > I checked. Looks good. Thanks for doing it. > Applying: RDMA/core: Sync unregistration with netlink commands Using > index info to reconstruct a base tree... > M drivers/infiniband/core/core_priv.h > M drivers/infiniband/core/device.c > M include/rdma/ib_verbs.h > Falling back to patching base and 3-way merge... > Auto-merging include/rdma/ib_verbs.h > Auto-merging drivers/infiniband/core/device.c CONFLICT (content): Merge > conflict in drivers/infiniband/core/device.c Auto-merging > drivers/infiniband/core/core_priv.h > error: Failed to merge in the changes. > > Applied to for-next > > Thanks, > Jason