Re: [for-next PATCH 2/4] RDMA/bnxt_re: Refactor rdev init/uninit path and simplifying rtnl_lock usage

Selvin Xavier <selvin.xavier@xxxxxxxxxxxx> · Mon, 10 Dec 2018 11:06:29 +0530



On Sat, Dec 8, 2018 at 2:43 AM Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote:
>
> On Fri, Dec 07, 2018 at 11:43:50AM +0530, Selvin Xavier wrote:
> > On Fri, Dec 7, 2018 at 12:27 AM Jason Gunthorpe <jgg@xxxxxxxxxxxx> wrote:
> > >
> > > On Thu, Dec 06, 2018 at 10:49:44AM -0800, Selvin Xavier wrote:
> > > > From: Somnath Kotur <somnath.kotur@xxxxxxxxxxxx>
> > > >
> > > >  - Add more flags for better granularity in rdev init/uninit path.
> > > >  - bnxt_re_ib_reg() handles two main functionalities - initializing
> > > >    the device and registering with the IB stack.  Split it into 2
> > > >    functions i.e. bnxt_re_dev_init() and bnxt_re_ib_init()  to account
> > > >    for the same thereby improve modularity. Do the same for
> > > >    bnxt_re_ib_unreg()i.e. split into two functions - bnxt_re_dev_uninit()
> > > >    and  bnxt_re_ib_uninit().
> > > >  - Since bnxt_re_dev_init() part of bnxt_re_ib_reg()
> > > >    takes rtnl_lock inside for the entire function, so it is better to
> > > >    invoke it from bnxt_netdev_event() itself where the rtnl_lock is held
> > > >    already for  NETDEV_EVENT.
> > >
> > > Drivers cannot hold rtnl_lock while calling ib_unregister_device() (or
> > > really any lock, unless done carefully). Looks like bnxt does this
> > > already and prehaps this patch makes it worse?
> > >
> > Not really. You are right that bnxt is already calling
> > ib_unregister_device from rtnl_lock.
> > Current code has three instances of calling ib_unregister_device from
> > rtnl_lock context,
> > We  are removing two of them in this patch.
> >
> > Un-int routine is split into two. bnxt_re_ib_uninit and bnxt_re_remove_device .
> > bnxt_re_ib_uninit  calls ib_unregister_device and
> > bnxt_re_remove_device destroys the resources from HW.
> >
> > In this patch, we call bnxt_re_ib_uninit outside rtnl_lock and
> > bnxt_re_remove_device with rtnl_lock
> >  (except in bnxt_re_shutdown which i will explain later).
> >
> > In case of mod_exit, bnxt_re_ib_uninit  is invoked first outside the rtnl_lock.
> > After this, rtnl_lock is acquired and bnxt_re_remove_device is invoked.
>
> Yuk.
>
> I'm looking at this driver and I see the same basic racy mess I see in
> rxe regarding liftime and locking. Look at the patch I just sent to
> Steve and consider using its new functions in this driver too instead
> of the bnxt_re_dev_list, and other related buggyness.
>
> Obviously the bnxt_re_shutdown problem has to be fixed too..
>
> > >> @@ -1555,9 +1581,14 @@ static int bnxt_re_netdev_event(struct notifier_block *notifier,
> > >>                  */
> > >>                 if (atomic_read(&rdev->sched_count) > 0)
> > >>                         goto exit;
> > >> -               bnxt_re_ib_unreg(rdev);
> > >> -               bnxt_re_remove_one(rdev);
> > >> -               bnxt_re_dev_unreg(rdev);
> > >> +               /* Schedule work for unregistering from IB stack */
> > >> +               if (test_bit(BNXT_RE_FLAG_IBDEV_REGISTERED, &rdev->flags)) {
> > >> +                       sch_work = true;
> > >> +                       break;
> > >> +               }
> >
> > if the IB device is registered, we schedule a work to invoke
> > ib_unregister_device and return notifier.
> > So bnxt_re_ib_uninit is invoked from the bnxt_re_task.
>
> BNXT_RE_FLAG_IBDEV_REGISTERED is also all racy and wrong.
>
> > Since the netdev reference is not released by bnxt_re driver (ref is
> > released in bnxt_re_remove_device),
> > the network stack will invoke all notifiers again till all the
> > references are released.
> > During the second or later invocation of the netdev notifier, we
> > proceed with the bnxt_re_remove_device.
>
> That is also racy. unregister_netdevice_notifier() could have
> happened between the first and second call, so now it leaks.
>
> If ib_unregister happens the driver must do the remove_device stuff
> immediately.
>
> So, I think you need to use that stuff I sent to Steve. This patch
> might be an improvement if the above is fixed, maybe, it isn't
> clear. (and Steve, now that I see the same mess in bnxt, I'm more than
> convinced that is the right approach to your problem too)
>
> However, I'd rather see this as part of a series that comprehensively
> fixes all of this.
This patch is definitely an improvement on the existing driver issues.
But i am okay to rework with your patch. I will take your patch, test
it  and get back.
 Was your patch cut against for-next branch? it was not applying
cleanly on for-next.

>
> Jason