Re: [PATCH rdma-next 4/7] RDMA/core: Introduce a helper function to change net namespace of rdma device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sat, Jan 12, 2019 at 04:34:20AM +0000, Parav Pandit wrote:
> 
> 
> > From: Jason Gunthorpe <jgg@xxxxxxxx>
> > Sent: Friday, January 11, 2019 4:43 PM
> > To: Leon Romanovsky <leon@xxxxxxxxxx>
> > Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky
> > <leonro@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>;
> > Parav Pandit <parav@xxxxxxxxxxxx>
> > Subject: Re: [PATCH rdma-next 4/7] RDMA/core: Introduce a helper function
> > to change net namespace of rdma device
> > 
> > On Tue, Dec 18, 2018 at 02:28:33PM +0200, Leon Romanovsky wrote:
> > 
> > > +static int _ib_dev_change_net_ns(struct ib_device *device, struct net
> > > +*net) {
> > > +	int ret = -EINVAL;
> > > +
> > > +	if (net_eq(rdma_dev_net(device), net))
> > > +		return 0;
> > > +
> > > +	/*
> > > +	 * Currently supported only for those providers which supports
> > > +	 * disassociation and those which doesn't do port specific sysfs
> > > +	 * init. Once a port_cleanup infrastructure is implemented, this
> > > +	 * limitation will be removed.
> > > +	 */
> > > +	if (!device->ops.disassociate_ucontext || device->ops.port_init)
> > > +		return -EOPNOTSUPP;
> > 
> > This doesn't drain the refcount - why not?
> >
> Refcount is drained below in unregister_remove_clients().
> Above is just a check if device can moved to net namespace or not.

That function is badly named then

> > > +	unregister_remove_clients(device);
> > > +	rdma_compatdev_remove(device);
> > > +	ib_device_unregister_rdmacg(device);
> > > +
> > > +	ib_gid_table_cleanup_one(device);
> > > +	free_client_contexts(device);
> > > +	device->reg_state = IB_DEV_UNREGISTERED;
> > 
> > Now the device is unregistered but still on the device list? Yikes, that is quite
> > bad.
> No. no. It is not on the device list. You probably missed unregister_remove_clients().
> Device is first taken off from the list, followed by removing all clients(), which is exact reverse sequence of registration.

But that is worse! Now we have incompletely unregistered netdev that
is not on any tracking list! We are back to the thing you brough up
earlier where the name uniqueness no longer works as expected.

So I think it has to stay on the list and the list needs to handle a
partially unregistered device. I have patches for that..

> > > +	device->netns_shared = 0;
> > > +	rdma_coredev_net_set(&device->coredev, net);
> > > +
> > > +	rdma_roce_rescan_device(device);
> > > +	ib_device_cache_update(device);
> > > +
> > > +	ret = ib_device_register_rdmacg(device);
> > > +	if (ret) {
> > > +		dev_warn(&device->dev,
> > > +			 "Couldn't register device with rdma cgroup\n");
> > > +		goto out;
> > 
> > And this leaves an unregistered device on the device_list!?! That is really not
> > OK.
> No. it is not on the list as I described above.

Well, in that case we are leaking a device that is half exsting into
some kind of bogus untracked state is very bad. It plays very poorly
with the patch I already posted for Steve to do 'unregister driver' as
we have lost track of ib_devices.

> > 
> > > +	/* Rename sysfs tree in new namespace */
> > > +	ret = device_rename(&device->dev, device->name);
> > > +	if (ret) {
> > > +		dev_warn(&device->dev, "Couldn't rename device\n");
> > > +		goto cg_cleanup;
> > > +	}
> > 
> > What is this doing? How can device_name and dev_name() become
> > inconsistent? I'm tring to delete device->name.
> > 
> Device net namespace changed, so sysfs entry should not show up in new namespace.
> So this rename call moves /sys/class/infiniband/<device> entry to new namespace.

Really? How peculiar.

So why isn't it 

  device_rename(&device->dev, dev_name(&device->dev))

?

> > > +int ib_dev_change_net_ns(struct ib_device *device, struct net *net) {
> > > +	int ret;
> > > +
> > > +	mutex_lock(&ib_device_mutex);
> > > +	/* Perform unregistration, registration sequence under a lock.
> > > +	 * This ensures that such rdma device doesn't get removed or
> > renamed
> > > +	 * while it is moving to different net namespace.
> > > +	 */
> > 
> > I don't like this implementation at all. The device_mutex was not intended
> > to be a registration lock, it already has an overly broad usage, we should not
> > make it worse.
> > 
> There are basically following threads in progress that needs to sync with each other for various reasons.
> 
> Device registration
> Device unregistration
> Net namespace deletion
> Net namespace creation
> Device rename
> Client registration
> Client unregistration
> 
> With existing code it uses device_mutex.

Well, no, the device_mutex is supposed to protect the device and other
lists, it is not a lock to be used against the entire registration
flow. If a lock for that is needed then we'll need a new lock

Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux