On Mon, Feb 24, 2020 at 7:24 PM Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote: > > > static void bnxt_re_stop_irq(void *handle) > > @@ -1317,7 +1320,37 @@ static void bnxt_re_query_hwrm_intf_version(struct bnxt_re_dev *rdev) > > le16_to_cpu(resp.hwrm_intf_patch); > > } > > > > -static void bnxt_re_ib_unreg(struct bnxt_re_dev *rdev) > > +static void bnxt_re_ib_uninit(struct bnxt_re_dev *rdev) > > +{ > > + /* Cleanup ib dev */ > > + if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) { > > + ib_unregister_device(&rdev->ibdev); > > + clear_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags); > > + } > > The reason ib_unregister_device_queued exists is because you can't > call unregistration while holding RTNL. > > > case NETDEV_UNREGISTER: > > @@ -1704,9 +1727,8 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier, > > */ > > if (atomic_read(&rdev->sched_count) > 0) > > goto exit; > > - bnxt_re_ib_unreg(rdev); > > - bnxt_re_remove_one(rdev); > > - bnxt_re_dev_unreg(rdev); > > + bnxt_re_ib_uninit(rdev); > > + bnxt_re_remove_device(rdev); > > break; > > ie here. > > This *must* simply be a call to ib_unregister_device_queued() and all > this other stuff has to go into the dealloc. > > As written this is all kinds of deadlocking and racy > This patch (patch 2 of this series) was to refactor the existing code and group the reg/unreg operations as bnxt_re_add_device/bnxt_re_remove_device. The third patch in this series is introducing the dealloc_driver hook and changing the code as you suggested by calling ib_unregister_device_queued/ ib_unregister_driver. I didn't want to squash these two patches together. -Selvin > Jason