On Sat, Dec 8, 2018 at 2:43 AM Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote: > > On Fri, Dec 07, 2018 at 11:43:50AM +0530, Selvin Xavier wrote: > > On Fri, Dec 7, 2018 at 12:27 AM Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote: > > > > > > On Thu, Dec 06, 2018 at 10:49:44AM -0800, Selvin Xavier wrote: > > > > From: Somnath Kotur <somnath.kotur@xxxxxxxxxxxx> > > > > > > > > - Add more flags for better granularity in rdev init/uninit path. > > > > - bnxt_re_ib_reg() handles two main functionalities - initializing > > > > the device and registering with the IB stack. Split it into 2 > > > > functions i.e. bnxt_re_dev_init() and bnxt_re_ib_init() to account > > > > for the same thereby improve modularity. Do the same for > > > > bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit() > > > > and bnxt_re_ib_uninit(). > > > > - Since bnxt_re_dev_init() part of bnxt_re_ib_reg() > > > > takes rtnl_lock inside for the entire function, so it is better to > > > > invoke it from bnxt_netdev_event() itself where the rtnl_lock is held > > > > already for NETDEV_EVENT. > > > > > > Drivers cannot hold rtnl_lock while calling ib_unregister_device() (or > > > really any lock, unless done carefully). Looks like bnxt does this > > > already and prehaps this patch makes it worse? > > > > > Not really. You are right that bnxt is already calling > > ib_unregister_device from rtnl_lock. > > Current code has three instances of calling ib_unregister_device from > > rtnl_lock context, > > We are removing two of them in this patch. > > > > Un-int routine is split into two. bnxt_re_ib_uninit and bnxt_re_remove_device . > > bnxt_re_ib_uninit calls ib_unregister_device and > > bnxt_re_remove_device destroys the resources from HW. > > > > In this patch, we call bnxt_re_ib_uninit outside rtnl_lock and > > bnxt_re_remove_device with rtnl_lock > > (except in bnxt_re_shutdown which i will explain later). > > > > In case of mod_exit, bnxt_re_ib_uninit is invoked first outside the rtnl_lock. > > After this, rtnl_lock is acquired and bnxt_re_remove_device is invoked. > > Yuk. > > I'm looking at this driver and I see the same basic racy mess I see in > rxe regarding liftime and locking. Look at the patch I just sent to > Steve and consider using its new functions in this driver too instead > of the bnxt_re_dev_list, and other related buggyness. > > Obviously the bnxt_re_shutdown problem has to be fixed too.. > > > >> @@ -1555,9 +1581,14 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier, > > >> */ > > >> if (atomic_read(&rdev->sched_count) > 0) > > >> goto exit; > > >> - bnxt_re_ib_unreg(rdev); > > >> - bnxt_re_remove_one(rdev); > > >> - bnxt_re_dev_unreg(rdev); > > >> + /* Schedule work for unregistering from IB stack */ > > >> + if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) { > > >> + sch_work = true; > > >> + break; > > >> + } > > > > if the IB device is registered, we schedule a work to invoke > > ib_unregister_device and return notifier. > > So bnxt_re_ib_uninit is invoked from the bnxt_re_task. > > BNXT_RE_FLAG_IBDEV_REGISTERED is also all racy and wrong. > > > Since the netdev reference is not released by bnxt_re driver (ref is > > released in bnxt_re_remove_device), > > the network stack will invoke all notifiers again till all the > > references are released. > > During the second or later invocation of the netdev notifier, we > > proceed with the bnxt_re_remove_device. > > That is also racy. unregister_netdevice_notifier() could have > happened between the first and second call, so now it leaks. > > If ib_unregister happens the driver must do the remove_device stuff > immediately. > > So, I think you need to use that stuff I sent to Steve. This patch > might be an improvement if the above is fixed, maybe, it isn't > clear. (and Steve, now that I see the same mess in bnxt, I'm more than > convinced that is the right approach to your problem too) > > However, I'd rather see this as part of a series that comprehensively > fixes all of this. This patch is definitely an improvement on the existing driver issues. But i am okay to rework with your patch. I will take your patch, test it and get back. Was your patch cut against for-next branch? it was not applying cleanly on for-next. > > Jason