On Fri, Nov 16, 2018 at 03:50:57AM +0200, Leon Romanovsky wrote: > From: Parav Pandit <parav@xxxxxxxxxxxx> > > Currently when rdma device is getting removed, get resource info can > race with device removal example below. > > CPU-0 CPU-1 > -------- -------- > rdma_nl_rcv_msg() > nldev_res_get_cq_dumpit() > mutex_lock(device_lock); > get device reference > mutex_unlock(device_lock); [..] > ib_unregister_device() > /* Valid reference to > * device->dev exists. > */ > ib_dealloc_device() > > [..] > provider->fill_res_entry(); > > Even though device object is not freed, fill_res_entry() can get called > on device which doesn't from provider driver side. > Kernel core device reference count is not sufficient. > > Similar race can occur with device renaming and device removal, where > device_rename() tries to rename a unregistered device. While this is fine > for devices of a class which are not net namespace aware, but it is > incorrect for net namespace aware class coming in subsequent series. > If a class is net namespace aware, than below [1] call trace is observed > in above situation. > > Therefore, to avoid the such race, keep a reference count and let device > unregistration wait until all netlink users drop the reference. > > [1] Call trace: > kernfs: ns required in 'infiniband' for 'mlx5_0' > WARNING: CPU: 18 PID: 44270 at fs/kernfs/dir.c:842 kernfs_find_ns+0x104/0x120 > libahci i2c_core mlxfw libata dca [last unloaded: devlink] > RIP: 0010:kernfs_find_ns+0x104/0x120 > Call Trace: > kernfs_find_and_get_ns+0x2e/0x50 > sysfs_rename_link_ns+0x40/0xb0 > device_rename+0xb2/0xf0 > ib_device_rename+0xb3/0x100 [ib_core] > nldev_set_doit+0x165/0x190 [ib_core] > rdma_nl_rcv_msg+0x249/0x250 [ib_core] > ? netlink_deliver_tap+0x8f/0x3e0 > rdma_nl_rcv+0xd6/0x120 [ib_core] > netlink_unicast+0x17c/0x230 > netlink_sendmsg+0x2f0/0x3e0 > sock_sendmsg+0x30/0x40 > __sys_sendto+0xdc/0x160 > > Fixes: da5c85078215 ("RDMA/nldev: add driver-specific resource tracking") > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > --- > Changelog v0->v1: > * Fixed type in comment > * Rephrased comment > --- > drivers/infiniband/core/core_priv.h | 1 + > drivers/infiniband/core/device.c | 27 +++++++++++++++++++++++---- > drivers/infiniband/core/nldev.c | 20 ++++++++++---------- > include/rdma/ib_verbs.h | 8 +++++++- > 4 files changed, 41 insertions(+), 15 deletions(-) > > -- > 2.19.1 This patch also doesn't apply, but the fix seemed simple enough. Again please check. I also reworded some of the comments. Applying: RDMA/core: Sync unregistration with netlink commands Using index info to reconstruct a base tree... M drivers/infiniband/core/core_priv.h M drivers/infiniband/core/device.c M include/rdma/ib_verbs.h Falling back to patching base and 3-way merge... Auto-merging include/rdma/ib_verbs.h Auto-merging drivers/infiniband/core/device.c CONFLICT (content): Merge conflict in drivers/infiniband/core/device.c Auto-merging drivers/infiniband/core/core_priv.h error: Failed to merge in the changes. Applied to for-next Thanks, Jason