RE: [PATCH rdma-next v1] RDMA/core: Sync unregistration with netlink commands

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Jason Gunthorpe <jgg@xxxxxxxx>
> Sent: Thursday, November 22, 2018 1:42 PM
> To: Leon Romanovsky <leon@xxxxxxxxxx>
> Cc: Doug Ledford <dledford@xxxxxxxxxx>; Parav Pandit
> <parav@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>;
> Steve Wise <swise@xxxxxxxxxxxxxxxxxxxxx>; Leon Romanovsky
> <leonro@xxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-next v1] RDMA/core: Sync unregistration with
> netlink commands
> 
> On Fri, Nov 16, 2018 at 03:50:57AM +0200, Leon Romanovsky wrote:
> > From: Parav Pandit <parav@xxxxxxxxxxxx>
> >
> > Currently when rdma device is getting removed, get resource info can
> > race with device removal example below.
> >
> >       CPU-0                                  CPU-1
> >     --------                               --------
> >     rdma_nl_rcv_msg()
> >        nldev_res_get_cq_dumpit()
> >           mutex_lock(device_lock);
> >           get device reference
> >           mutex_unlock(device_lock);        [..]
> >                                             ib_unregister_device()
> >                                             /* Valid reference to
> >                                              * device->dev exists.
> >                                              */
> >                                              ib_dealloc_device()
> >
> >           [..]
> >           provider->fill_res_entry();
> >
> > Even though device object is not freed, fill_res_entry() can get
> > called on device which doesn't from provider driver side.
> > Kernel core device reference count is not sufficient.
> >
> > Similar race can occur with device renaming and device removal, where
> > device_rename() tries to rename a unregistered device. While this is
> > fine for devices of a class which are not net namespace aware, but it
> > is incorrect for net namespace aware class coming in subsequent series.
> > If a class is net namespace aware, than below [1] call trace is
> > observed in above situation.
> >
> > Therefore, to avoid the such race, keep a reference count and let
> > device unregistration wait until all netlink users drop the reference.
> >
> > [1] Call trace:
> > kernfs: ns required in 'infiniband' for 'mlx5_0'
> > WARNING: CPU: 18 PID: 44270 at fs/kernfs/dir.c:842
> > kernfs_find_ns+0x104/0x120 libahci i2c_core mlxfw libata dca [last
> > unloaded: devlink]
> > RIP: 0010:kernfs_find_ns+0x104/0x120
> > Call Trace:
> > kernfs_find_and_get_ns+0x2e/0x50
> > sysfs_rename_link_ns+0x40/0xb0
> > device_rename+0xb2/0xf0
> > ib_device_rename+0xb3/0x100 [ib_core]
> > nldev_set_doit+0x165/0x190 [ib_core]
> > rdma_nl_rcv_msg+0x249/0x250 [ib_core]
> > ? netlink_deliver_tap+0x8f/0x3e0
> > rdma_nl_rcv+0xd6/0x120 [ib_core]
> > netlink_unicast+0x17c/0x230
> > netlink_sendmsg+0x2f0/0x3e0
> > sock_sendmsg+0x30/0x40
> > __sys_sendto+0xdc/0x160
> >
> > Fixes: da5c85078215 ("RDMA/nldev: add driver-specific resource
> > tracking")
> > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx>
> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > ---
> > Changelog v0->v1:
> >  * Fixed type in comment
> >  * Rephrased comment
> > ---
> >  drivers/infiniband/core/core_priv.h |  1 +
> >  drivers/infiniband/core/device.c    | 27 +++++++++++++++++++++++----
> >  drivers/infiniband/core/nldev.c     | 20 ++++++++++----------
> >  include/rdma/ib_verbs.h             |  8 +++++++-
> >  4 files changed, 41 insertions(+), 15 deletions(-)
> >
> > --
> > 2.19.1
> 
> This patch also doesn't apply, but the fix seemed simple enough. Again
> please check. I also reworded some of the comments.
> 
I checked. Looks good. Thanks for doing it.

> Applying: RDMA/core: Sync unregistration with netlink commands Using
> index info to reconstruct a base tree...
> M	drivers/infiniband/core/core_priv.h
> M	drivers/infiniband/core/device.c
> M	include/rdma/ib_verbs.h
> Falling back to patching base and 3-way merge...
> Auto-merging include/rdma/ib_verbs.h
> Auto-merging drivers/infiniband/core/device.c CONFLICT (content): Merge
> conflict in drivers/infiniband/core/device.c Auto-merging
> drivers/infiniband/core/core_priv.h
> error: Failed to merge in the changes.
> 
> Applied to for-next
> 
> Thanks,
> Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux