RE: [PATCH rdma-next 4/7] RDMA/core: Introduce a helper function to change net namespace of rdma device

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> -----Original Message-----
> From: Jason Gunthorpe <jgg@xxxxxxxx>
> Sent: Friday, January 11, 2019 4:43 PM
> To: Leon Romanovsky <leon@xxxxxxxxxx>
> Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky
> <leonro@xxxxxxxxxxxx>; RDMA mailing list <linux-rdma@xxxxxxxxxxxxxxx>;
> Parav Pandit <parav@xxxxxxxxxxxx>
> Subject: Re: [PATCH rdma-next 4/7] RDMA/core: Introduce a helper function
> to change net namespace of rdma device
> 
> On Tue, Dec 18, 2018 at 02:28:33PM +0200, Leon Romanovsky wrote:
> 
> > +static int _ib_dev_change_net_ns(struct ib_device *device, struct net
> > +*net) {
> > +	int ret = -EINVAL;
> > +
> > +	if (net_eq(rdma_dev_net(device), net))
> > +		return 0;
> > +
> > +	/*
> > +	 * Currently supported only for those providers which supports
> > +	 * disassociation and those which doesn't do port specific sysfs
> > +	 * init. Once a port_cleanup infrastructure is implemented, this
> > +	 * limitation will be removed.
> > +	 */
> > +	if (!device->ops.disassociate_ucontext || device->ops.port_init)
> > +		return -EOPNOTSUPP;
> 
> This doesn't drain the refcount - why not?
>
Refcount is drained below in unregister_remove_clients().
Above is just a check if device can moved to net namespace or not.
 
> > +	unregister_remove_clients(device);
> > +	rdma_compatdev_remove(device);
> > +	ib_device_unregister_rdmacg(device);
> > +
> > +	ib_gid_table_cleanup_one(device);
> > +	free_client_contexts(device);
> > +	device->reg_state = IB_DEV_UNREGISTERED;
> 
> Now the device is unregistered but still on the device list? Yikes, that is quite
> bad.
No. no. It is not on the device list. You probably missed unregister_remove_clients().
Device is first taken off from the list, followed by removing all clients(), which is exact reverse sequence of registration.

> 
> > +	device->netns_shared = 0;
> > +	rdma_coredev_net_set(&device->coredev, net);
> > +
> > +	rdma_roce_rescan_device(device);
> > +	ib_device_cache_update(device);
> > +
> > +	ret = ib_device_register_rdmacg(device);
> > +	if (ret) {
> > +		dev_warn(&device->dev,
> > +			 "Couldn't register device with rdma cgroup\n");
> > +		goto out;
> 
> And this leaves an unregistered device on the device_list!?! That is really not
> OK.
No. it is not on the list as I described above.

> 
> > +	/* Rename sysfs tree in new namespace */
> > +	ret = device_rename(&device->dev, device->name);
> > +	if (ret) {
> > +		dev_warn(&device->dev, "Couldn't rename device\n");
> > +		goto cg_cleanup;
> > +	}
> 
> What is this doing? How can device_name and dev_name() become
> inconsistent? I'm tring to delete device->name.
> 
Device net namespace changed, so sysfs entry should not show up in new namespace.
So this rename call moves /sys/class/infiniband/<device> entry to new namespace.

> > +int ib_dev_change_net_ns(struct ib_device *device, struct net *net) {
> > +	int ret;
> > +
> > +	mutex_lock(&ib_device_mutex);
> > +	/* Perform unregistration, registration sequence under a lock.
> > +	 * This ensures that such rdma device doesn't get removed or
> renamed
> > +	 * while it is moving to different net namespace.
> > +	 */
> 
> I don't like this implementation at all. The device_mutex was not intended
> to be a registration lock, it already has an overly broad usage, we should not
> make it worse.
> 
There are basically following threads in progress that needs to sync with each other for various reasons.

Device registration
Device unregistration
Net namespace deletion
Net namespace creation
Device rename
Client registration
Client unregistration

With existing code it uses device_mutex.

> The patch I was working on with Steve introduces a proper lock for
> registration, I'd rather go in that direction..
> 
I am reviewing the alternative.

> I have some stuff that improves this related to the stuff for Steve, but I'm
> not sure what this should even do, semantically..
>
Even with improvements, above parallel threads needs to be in sync.
I am reviewing the alternative.
 
> Having some crazy half registered state is super ugly..
> 
You missed the unregister_remove_clients() due to which it looks crazy.
Better thing to do is, split remove from list and removing clients as two different function calls to bring code clarity.
I am reviewing that.

> > +/**
> > + * rdma_dev_net - Return net namespace of the ib device
> > + * @dev:	Pointer to ib_device whose net namespace to read
> > + * rdma_dev_net() returns the net namespace of the rdma device.
> > + * This API is usable only to those ULPs which register with the core
> > +using
> > + * using ib_register_client() API. This ensures that ib_clients have
> > +access
> > + * to stable net namespace of rdma device without a complex locking
> scheme.
> > + * This API should not be used by the provider drivers, as provider
> > +drivers
> > + * are not yet made aware of the rdma device's net namespace movement
> changes.
> > + */
> 
> This should be clarified that it is only OK for clients to use becuase clients
> must stop making any ib_device call after remove().
>
Yes. but we want orchestrator to explicitly move upper devices to net namespace, this API is now moving to core_priv.h and going to be just a helper.

 
> Jason




[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux