On 11/30/2018 10:27 AM, Jason Gunthorpe wrote: > On Fri, Nov 30, 2018 at 09:40:52AM -0600, Steve Wise wrote: >> >> On 11/29/2018 10:54 PM, Jason Gunthorpe wrote: >>> On Fri, Nov 30, 2018 at 01:29:22AM +0000, Ruhl, Michael J wrote: >>> >>>> Should the mutex be held while calling the callback? If the newlink or dellink >>>> callbacks need to sleep (probably dellink specifically), I think holding the mutex >>>> might not be a good idea. >>> It would be a bit better if this lock was a rwsem, but it must be held >>> whenever ops is touched.. This is probably a moot point though when >>> dellink goes through the ib_dev not the ops list.. >>> >> I'm not sure what you mean, but my current code under test keeps the >> rdma_link_ops pointer cached in the struct ib_device. Then >> nldev_dellink() looks like this: >> >> + device = ib_device_get_by_index(index); >> + if (!device) >> + return -EINVAL; >> + >> + mutex_lock(&link_ops_mutex); >> + /* Deref the device before deleting it. Otherwise we >> + * deadlock unregistering the device. >> + */ >> + ib_device_put(device); >> + >> + if (device->link_ops) >> + err = device->link_ops->dellink(device); > The driver must directly guarantee that the lifetime of everything > under device, including link_ops, exists until ib_unregister_device() > returns. > > For something like link_ops this happens without any extra effort. > > So the lock is not needed to keep link_ops alive. > >> I still think the mutex is needed... Note I had to call ib_device_put() >> -before- calling the driver dellink function or we deadlocked doing the >> device unregister. > Ohh... this is all racy in rxe, and rxe has locking bugs :( > > rxe calls ib_unregister_device from module unload, sysfs/netlink and > netdev notifiers and does nothing to guarentee ib_unregsiter_device() > is called exactly once. > > ib_device_put like this is also racy as there is nothing holding the > kref in this case, this could get the kref before putting I suppose, > but then the core netlink code will try and put an already put device > and blamo... So I think you need to add some way to tell netlink that > this entire command should use kref not put ? > > and I don't entirely know how to fix everything wrong with rxe, but > broadly > - List manipulation of rxe_dev_list must always hold the dev_list_lock, > at least rxe_notify doesn't > - rxe_remove should do the list_del internally, and it should be > approximately: > > spin_lock(&dev_list_lock); > already_removed = list_empty(&rxe->list); > list_del_init(&rxe->list); > spin_unlock(&dev_list_lock); > > if (!already_removed) > ibv_unregister_device(&rxe->ibdev); > > - Something needs to prevent module unload until all ib devices are > unregistered. If the notifier already has internal locking then it > probably makes sense to keep link_ops_mutex in the core code, but as > a rwsem > > Jason So perhaps nldev_dellink() should just lookup the ib_device by index using ib_device_get_by_index(), note the device name (copy it locally) and rdma_link_ops pointer, and call ib_device_put(). Then call the driver's dellink() function, passing the device name. Thus the driver can coordinate the multiple ways the device could be unregistered. An alternative that I discussed with Leon, was having nldev_dellink() just traverse all the registered rdma_link_ops and calling each dellink() function, passing the device name, until one succeeds or they all fail. Simple... I'm going to avoid fixing already-existing RXE bugs as part of this series if I can... Steve.