> -----Original Message----- > From: Jason Gunthorpe <jgg@xxxxxxxx> > Sent: Saturday, January 12, 2019 11:12 AM > To: Parav Pandit <parav@xxxxxxxxxxxx> > Cc: Leon Romanovsky <leon@xxxxxxxxxx>; Doug Ledford > <dledford@xxxxxxxxxx>; Leon Romanovsky <leonro@xxxxxxxxxxxx>; RDMA > mailing list <linux-rdma@xxxxxxxxxxxxxxx> > Subject: Re: [PATCH rdma-next 4/7] RDMA/core: Introduce a helper function > to change net namespace of rdma device > > On Sat, Jan 12, 2019 at 04:34:20AM +0000, Parav Pandit wrote: > > > > > > > From: Jason Gunthorpe <jgg@xxxxxxxx> > > > Sent: Friday, January 11, 2019 4:43 PM > > > To: Leon Romanovsky <leon@xxxxxxxxxx> > > > Cc: Doug Ledford <dledford@xxxxxxxxxx>; Leon Romanovsky > > > <leonro@xxxxxxxxxxxx>; RDMA mailing list > > > <linux-rdma@xxxxxxxxxxxxxxx>; Parav Pandit <parav@xxxxxxxxxxxx> > > > Subject: Re: [PATCH rdma-next 4/7] RDMA/core: Introduce a helper > > > function to change net namespace of rdma device > > > > > > On Tue, Dec 18, 2018 at 02:28:33PM +0200, Leon Romanovsky wrote: > > > > > > > +static int _ib_dev_change_net_ns(struct ib_device *device, struct > > > > +net > > > > +*net) { > > > > + int ret = -EINVAL; > > > > + > > > > + if (net_eq(rdma_dev_net(device), net)) > > > > + return 0; > > > > + > > > > + /* > > > > + * Currently supported only for those providers which supports > > > > + * disassociation and those which doesn't do port specific sysfs > > > > + * init. Once a port_cleanup infrastructure is implemented, this > > > > + * limitation will be removed. > > > > + */ > > > > + if (!device->ops.disassociate_ucontext || device->ops.port_init) > > > > + return -EOPNOTSUPP; > > > > > > This doesn't drain the refcount - why not? > > > > > Refcount is drained below in unregister_remove_clients(). > > Above is just a check if device can moved to net namespace or not. > > That function is badly named then > > > > > + unregister_remove_clients(device); > > > > + rdma_compatdev_remove(device); > > > > + ib_device_unregister_rdmacg(device); > > > > + > > > > + ib_gid_table_cleanup_one(device); > > > > + free_client_contexts(device); > > > > + device->reg_state = IB_DEV_UNREGISTERED; > > > > > > Now the device is unregistered but still on the device list? Yikes, > > > that is quite bad. > > No. no. It is not on the device list. You probably missed > unregister_remove_clients(). > > Device is first taken off from the list, followed by removing all clients(), > which is exact reverse sequence of registration. > > But that is worse! Now we have incompletely unregistered netdev that is not > on any tracking list! We are back to the thing you brough up earlier where > the name uniqueness no longer works as expected. > It works because device_mutex is still held so it sync with other device registration and also device renaming happening in parallel. > So I think it has to stay on the list and the list needs to handle a partially > unregistered device. I have patches for that.. > > > > > + device->netns_shared = 0; > > > > + rdma_coredev_net_set(&device->coredev, net); > > > > + > > > > + rdma_roce_rescan_device(device); > > > > + ib_device_cache_update(device); > > > > + > > > > + ret = ib_device_register_rdmacg(device); > > > > + if (ret) { > > > > + dev_warn(&device->dev, > > > > + "Couldn't register device with rdma cgroup\n"); > > > > + goto out; > > > > > > And this leaves an unregistered device on the device_list!?! That is > > > really not OK. > > No. it is not on the list as I described above. > > Well, in that case we are leaking a device that is half exsting into some kind > of bogus untracked state is very bad. It plays very poorly with the patch I > already posted for Steve to do 'unregister driver' as we have lost track of > ib_devices. > In _ib_dev_change_net_ns() there are only two functions that can fail. One of them rdmacg_register_device() always return 0. I should supply a patch to simplify this path regardless. Second one is device_rename(), which should be rare event. Device warning is printed when namespace is deleted and additionally user error code is returned too if this was user initiated. So it shouldn't happen unless a trusted users kept the same names. > > > > > > > + /* Rename sysfs tree in new namespace */ > > > > + ret = device_rename(&device->dev, device->name); > > > > + if (ret) { > > > > + dev_warn(&device->dev, "Couldn't rename device\n"); > > > > + goto cg_cleanup; > > > > + } > > > > > > What is this doing? How can device_name and dev_name() become > > > inconsistent? I'm tring to delete device->name. > > > > > Device net namespace changed, so sysfs entry should not show up in new > namespace. > > So this rename call moves /sys/class/infiniband/<device> entry to new > namespace. > > Really? How peculiar. > > So why isn't it > > device_rename(&device->dev, dev_name(&device->dev)) > > ? > Yes, this better. It reduces the dependency on device->name. I will improve this. > > > > +int ib_dev_change_net_ns(struct ib_device *device, struct net *net) { > > > > + int ret; > > > > + > > > > + mutex_lock(&ib_device_mutex); > > > > + /* Perform unregistration, registration sequence under a lock. > > > > + * This ensures that such rdma device doesn't get removed or > > > renamed > > > > + * while it is moving to different net namespace. > > > > + */ > > > > > > I don't like this implementation at all. The device_mutex was not > > > intended to be a registration lock, it already has an overly broad > > > usage, we should not make it worse. > > > > > There are basically following threads in progress that needs to sync with > each other for various reasons. > > > > Device registration > > Device unregistration > > Net namespace deletion > > Net namespace creation > > Device rename > > Client registration > > Client unregistration > > > > With existing code it uses device_mutex. > > Well, no, the device_mutex is supposed to protect the device and other lists, > it is not a lock to be used against the entire registration flow. If a lock for > that is needed then we'll need a new lock > Ok. let me rework on this part of the series with new lock that protects these operations and reduce dependency on device mutex.