On Tue, 2018-12-11 at 23:06 +0000, Jason Gunthorpe wrote: > On Tue, Dec 11, 2018 at 02:15:41PM -0800, Parvi Kaustubhi wrote: > > From: Govindarajulu Varadarajan <gvaradar@xxxxxxxxx> > > > > There is a dead lock in usnic ib_register() and netdev_notify() path. > > > > usnic_ib_discover_pf() > > > mutex_lock(&usnic_ib_ibdev_list_lock); > > | usnic_ib_device_add(); > > | ib_register_device() > > | usnic_ib_query_port() > > | mutex_lock(&us_ibdev->usdev_lock); > > | ib_get_eth_speed() > > | rtnl_lock() > > > > Order of locking: &usnic_ib_ibdev_list_lock -> usdev_lock -> rtnl_lock > > > > rtnl_lock() > > > usnic_ib_netdevice_event() > > | mutex_lock(&usnic_ib_ibdev_list_lock); > > > > Order of locking: rtnl_lock -> &usnic_ib_ibdev_list_lock > > > > The solution is to not handle usnic_ib_netdevice_event() with the rtnl > > lock held. > > This commit creates a single threaded workqueue to defer the handling of > > netdev/inet events. > > > > Signed-off-by: Govindarajulu Varadarajan <gvaradar@xxxxxxxxx> > > Signed-off-by: Parvi Kaustubhi <pkaustub@xxxxxxxxx> > > --- > > drivers/infiniband/hw/usnic/usnic_ib.h | 6 +++ > > drivers/infiniband/hw/usnic/usnic_ib_main.c | 60 ++++++++++++++++++++++++--- > > -- > > 2 files changed, 56 insertions(+), 10 deletions(-) > > I'm really getting sick of seeing this same code pattern in the > ethernet connecting drivers.. I think this is the 3rd time now I've > seen a fix for this same rtnl locking bug in a driver. > Is the issue here using a workqueue to handle notify? > Does anyone want to make a proper driver core based struct > device_driver for this discovery model??? > Can you elaborate on this? I do not follow. > Also, this surely has the same bugs as rxe and bnxt, see those recent > threads too. Removing usnic_verbs modules works fine. I do not see issue. I will have to try this with lockdep on.