RE: [PATCH rdma-next 1/2] RDMA/core: Introduce a helper function to change net namespace of rdma device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Jason Gunthorpe <jgg@xxxxxxxx>
> Sent: Thursday, March 28, 2019 11:58 AM
> To: Leon Romanovsky <leon@xxxxxxxxxx>
> Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky
> <leonro@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>;
> Parav Pandit <parav@xxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-next 1/2] RDMA/core: Introduce a helper function
> to change net namespace of rdma device
> 
> On Tue, Feb 26, 2019 at 02:04:28PM +0200, Leon Romanovsky wrote:
> > From: Parav Pandit <parav@xxxxxxxxxxxx>
> >
> > Introduce a helper function that changes rdma device's net namespace
> > which performs mini disable enable sequence to have device visible
> > only in assigned net namespace.
> >
> > device unregistration, device rename and device change net namespace
> > may be invoked concurrently.
> > (a) device unregistration needs to wait until device change (rename or
> > net namespace change) operation is in progress.
> > (b) device net namespace change should not proceed if the
> > unregistration has started.
> > (c) while one cpu is changing device net namespace, other cpu should
> > not be able to rename or change net namespace.
> >
> > To address above concurrency,
> > (a) Use unreg_mutex to synchronize between ib_unregister_device() and
> > net namespace change operation
> > (b) In cases where unregister_device() has started unregistration
> > before change_netns got chance to acquire unreg_mutex, validate the
> > refcount - if it dropped to zero, abort the net namespace change
> operation.
> >
> > Finally use the helper function to change net namespace of ib device
> > to move the device back to init_net when such net is deleted.
> >
> > Signed-off-by: Parav Pandit <parav@xxxxxxxxxxxx>
> > Signed-off-by: Leon Romanovsky <leonro@xxxxxxxxxxxx>
> > drivers/infiniband/core/device.c | 121 ++++++++++++++++++++++++++++++-
> >  1 file changed, 120 insertions(+), 1 deletion(-)
> >
> > diff --git a/drivers/infiniband/core/device.c
> > b/drivers/infiniband/core/device.c
> > index 477b787bd083..48805e00b115 100644
> > +++ b/drivers/infiniband/core/device.c
> > @@ -201,6 +201,8 @@ static struct notifier_block ibdev_lsm_nb = {
> >  	.notifier_call = ib_security_change,  };
> >
> > +static int __rdma_dev_change_netns(struct ib_device *device, struct
> > +net *net);
> > +
> >  /* Pointer to the RCU head at the start of the ib_port_data array */
> > struct ib_port_data_rcu {
> >  	struct rcu_head rcu_head;
> > @@ -861,6 +863,14 @@ static int add_compat_devs(struct ib_device
> *device)
> >  	unsigned long index;
> >  	int ret = 0;
> >
> > +	/* If we didn't get a reference for a registered it must be
> > +	 * undergoing a net namespace change via netlink command.
> > +	 * In such case its not an error and no need to add any compat
> > +	 * devices either.
> > +	 */
> > +	if (!ib_device_try_get(device))
> > +		return 0;
> 
> The only place that calls add_compat_devs is enable_device_and_get, which
> is done under the devices_rwsem. It is not needed to try_get, nor is it
> possible to fail.
> 
> This should just get replaced by
> 
> 	lockdep_assert_held(&devices_rwsem);
> 
> 
> > +
> >  	down_read(&rdma_nets_rwsem);
> >  	xa_for_each (&rdma_nets, index, rnet) {
> >  		ret = add_one_compat_dev(device, rnet); @@ -868,6 +878,7
> @@ static
> > int add_compat_devs(struct ib_device *device)
> >  			break;
> >  	}
> >  	up_read(&rdma_nets_rwsem);
> > +	ib_device_put(device);
> >  	return ret;
> >  }
> >
> > @@ -952,6 +963,29 @@ int rdma_compatdev_set(u8 enable)
> >  	return ret;
> >  }
> >
> > +static void devices_to_default_net(struct net *net) {
> > +	struct ib_device *dev;
> > +	unsigned long index;
> > +
> > +	down_read(&devices_rwsem);
> > +	/* If net namespace change is initiated by the time we reach here,
> > +	 * or if device unregistration has started, don't consider those
> > +	 * devices.
> > +	 */
> > +	xa_for_each_marked (&devices, index, dev, DEVICE_REGISTERED) {
> > +		if (!net_eq(net, read_pnet(&dev->coredev.rdma_net)))
> > +			continue;
> > +
> > +		up_read(&devices_rwsem);
> > +		mutex_lock(&dev->unregistration_lock);
> 
> How does this work? We dropped devices_rwsem so 'dev' is invalid now
> 
get_device() and refcount check for zero is missing before up_read(). 
(similar to rdma_dev_change_netns_with_put()).
Refcount zero check is needed to make sure that device didn't get unregistered before acquiring the unreg_mutex.
Because calling device_rename() on device which has undergone device_del() is incorrect..

> > +		__rdma_dev_change_netns(dev, &init_net);
> > +		mutex_unlock(&dev->unregistration_lock);
> > +		down_read(&devices_rwsem);
> > +	}
> > +	up_read(&devices_rwsem);
> > +}
> > +
> >  static void rdma_dev_exit_net(struct net *net)  {
> >  	struct rdma_dev_net *rnet = net_generic(net, rdma_dev_net_id);
> @@
> > -983,7 +1017,7 @@ static void rdma_dev_exit_net(struct net *net)
> >  	}
> >  	up_read(&devices_rwsem);
> >
> > -	xa_erase(&rdma_nets, rnet->id);
> > +	devices_to_default_net(net);
> 
> ??? Still need that xa_erase...
>
Crap. Yes, it's needed.
 
> 
> >  }
> >
> >  static __net_init int rdma_dev_init_net(struct net *net) @@ -1433,6
> > +1467,91 @@ void ib_unregister_device_queued(struct ib_device *ib_dev)
> > }  EXPORT_SYMBOL(ib_unregister_device_queued);
> >
> > +static int __rdma_dev_change_netns(struct ib_device *device, struct
> > +net *net) {
> > +	int ret;
> > +
> > +	disable_device(device);
> > +
> > +	/* At this point no one would be using the device,
> > +	 * so it is safe to change the device net namespace.
> > +	 */
> > +	write_pnet(&device->coredev.rdma_net, net);
> > +	/* It is safe to rename the device here without holding
> > +	 * devices_rwsem, because all references to the device would
> > +	 * have dropped by now.
> > +	 */
> 
> This is misleading.. We are not actually renaming here, it is just updating the
> namespace. This is safe because the names are currently forced to be
> globally unique and so the new name in the new namespace is guaranteed
> free.
Yes, I will change the comment as - 'It is safe to change rename the device as rdma device names are globally unique".

> 
> The lack of a lock is only ok because nothing can be changin dev_name in
> parallel.
> 
> > +	if (ret) {
> > +		dev_warn(&device->dev, "%s Couldn't rename device\n",
> __func__);
> > +		return ret;
> > +	}
> > +	ret = enable_device_and_get(device);
> > +	if (ret) {
> > +		/* This shouldn't really happen, but if it does, let
> > +		 * user retry at later point. So don't disable the
> > +		 * device.
> > +		 */
> > +		dev_warn(&device->dev,
> > +			 "%s Couldn't re-enable device\n", __func__);
> > +	}
> 
> I'm not so sure about this, at worst it means we have a device that failed
> 'enable_driver' in the active list.. Only rxe uses this, and it can't fail, so
> maybe it is OK.
> 
> Also, disable_device calls free_netdevs() which is not restored by
> enable_device, so that needs some revisions too. Probably we can just move
> that call to free_netdevs() to unregister.
> 
I guess I overlooked this or when I posted the patches, it wasn't in tree. Not sure.
But yes, I can move to ib_unregister_device(). Its already called in ib_dealloc_device() and ib_device_release().

> Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux