> -----Original Message----- > From: Jason Gunthorpe <jgg@xxxxxxxx> > Sent: Thursday, March 28, 2019 11:58 AM > To: Leon Romanovsky <leon@xxxxxxxxxx> > Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky > <leonro@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>; > Parav Pandit <parav@xxxxxxxxxxxx> > Subject: Re: [PATCH rdma-next 1/2] RDMA/core: Introduce a helper function > to change net namespace of rdma device > > On Tue, Feb 26, 2019 at 02:04:28PM +0200, Leon Romanovsky wrote: > > From: Parav Pandit <parav@xxxxxxxxxxxx> > > > > Introduce a helper function that changes rdma device's net namespace > > which performs mini disable enable sequence to have device visible > > only in assigned net namespace. > > > > device unregistration, device rename and device change net namespace > > may be invoked concurrently. > > (a) device unregistration needs to wait until device change (rename or > > net namespace change) operation is in progress. > > (b) device net namespace change should not proceed if the > > unregistration has started. > > (c) while one cpu is changing device net namespace, other cpu should > > not be able to rename or change net namespace. > > > > To address above concurrency, > > (a) Use unreg_mutex to synchronize between ib_unregister_device() and > > net namespace change operation > > (b) In cases where unregister_device() has started unregistration > > before change_netns got chance to acquire unreg_mutex, validate the > > refcount - if it dropped to zero, abort the net namespace change > operation. > > > > Finally use the helper function to change net namespace of ib device > > to move the device back to init_net when such net is deleted. > > > > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx> > > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx> > > drivers/infiniband/core/device.c | 121 ++++++++++++++++++++++++++++++- > > 1 file changed, 120 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/infiniband/core/device.c > > b/drivers/infiniband/core/device.c > > index 477b787bd083..48805e00b115 100644 > > +++ b/drivers/infiniband/core/device.c > > @@ -201,6 +201,8 @@ static struct notifier_block ibdev_lsm_nb = { > > .notifier_call = ib_security_change, }; > > > > +static int __rdma_dev_change_netns(struct ib_device *device, struct > > +net *net); > > + > > /* Pointer to the RCU head at the start of the ib_port_data array */ > > struct ib_port_data_rcu { > > struct rcu_head rcu_head; > > @@ -861,6 +863,14 @@ static int add_compat_devs(struct ib_device > *device) > > unsigned long index; > > int ret = 0; > > > > + /* If we didn't get a reference for a registered it must be > > + * undergoing a net namespace change via netlink command. > > + * In such case its not an error and no need to add any compat > > + * devices either. > > + */ > > + if (!ib_device_try_get(device)) > > + return 0; > > The only place that calls add_compat_devs is enable_device_and_get, which > is done under the devices_rwsem. It is not needed to try_get, nor is it > possible to fail. > > This should just get replaced by > > lockdep_assert_held(&devices_rwsem); > > > > + > > down_read(&rdma_nets_rwsem); > > xa_for_each (&rdma_nets, index, rnet) { > > ret = add_one_compat_dev(device, rnet); @@ -868,6 +878,7 > @@ static > > int add_compat_devs(struct ib_device *device) > > break; > > } > > up_read(&rdma_nets_rwsem); > > + ib_device_put(device); > > return ret; > > } > > > > @@ -952,6 +963,29 @@ int rdma_compatdev_set(u8 enable) > > return ret; > > } > > > > +static void devices_to_default_net(struct net *net) { > > + struct ib_device *dev; > > + unsigned long index; > > + > > + down_read(&devices_rwsem); > > + /* If net namespace change is initiated by the time we reach here, > > + * or if device unregistration has started, don't consider those > > + * devices. > > + */ > > + xa_for_each_marked (&devices, index, dev, DEVICE_REGISTERED) { > > + if (!net_eq(net, read_pnet(&dev->coredev.rdma_net))) > > + continue; > > + > > + up_read(&devices_rwsem); > > + mutex_lock(&dev->unregistration_lock); > > How does this work? We dropped devices_rwsem so 'dev' is invalid now > get_device() and refcount check for zero is missing before up_read(). (similar to rdma_dev_change_netns_with_put()). Refcount zero check is needed to make sure that device didn't get unregistered before acquiring the unreg_mutex. Because calling device_rename() on device which has undergone device_del() is incorrect.. > > + __rdma_dev_change_netns(dev, &init_net); > > + mutex_unlock(&dev->unregistration_lock); > > + down_read(&devices_rwsem); > > + } > > + up_read(&devices_rwsem); > > +} > > + > > static void rdma_dev_exit_net(struct net *net) { > > struct rdma_dev_net *rnet = net_generic(net, rdma_dev_net_id); > @@ > > -983,7 +1017,7 @@ static void rdma_dev_exit_net(struct net *net) > > } > > up_read(&devices_rwsem); > > > > - xa_erase(&rdma_nets, rnet->id); > > + devices_to_default_net(net); > > ??? Still need that xa_erase... > Crap. Yes, it's needed. > > > } > > > > static __net_init int rdma_dev_init_net(struct net *net) @@ -1433,6 > > +1467,91 @@ void ib_unregister_device_queued(struct ib_device *ib_dev) > > } EXPORT_SYMBOL(ib_unregister_device_queued); > > > > +static int __rdma_dev_change_netns(struct ib_device *device, struct > > +net *net) { > > + int ret; > > + > > + disable_device(device); > > + > > + /* At this point no one would be using the device, > > + * so it is safe to change the device net namespace. > > + */ > > + write_pnet(&device->coredev.rdma_net, net); > > + /* It is safe to rename the device here without holding > > + * devices_rwsem, because all references to the device would > > + * have dropped by now. > > + */ > > This is misleading.. We are not actually renaming here, it is just updating the > namespace. This is safe because the names are currently forced to be > globally unique and so the new name in the new namespace is guaranteed > free. Yes, I will change the comment as - 'It is safe to change rename the device as rdma device names are globally unique". > > The lack of a lock is only ok because nothing can be changin dev_name in > parallel. > > > + if (ret) { > > + dev_warn(&device->dev, "%s Couldn't rename device\n", > __func__); > > + return ret; > > + } > > + ret = enable_device_and_get(device); > > + if (ret) { > > + /* This shouldn't really happen, but if it does, let > > + * user retry at later point. So don't disable the > > + * device. > > + */ > > + dev_warn(&device->dev, > > + "%s Couldn't re-enable device\n", __func__); > > + } > > I'm not so sure about this, at worst it means we have a device that failed > 'enable_driver' in the active list.. Only rxe uses this, and it can't fail, so > maybe it is OK. > > Also, disable_device calls free_netdevs() which is not restored by > enable_device, so that needs some revisions too. Probably we can just move > that call to free_netdevs() to unregister. > I guess I overlooked this or when I posted the patches, it wasn't in tree. Not sure. But yes, I can move to ib_unregister_device(). Its already called in ib_dealloc_device() and ib_device_release(). > Jason