Re: [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API

Jason Gunthorpe <jgg@xxxxxxxx> · Tue, 18 Feb 2020 20:15:50 -0400

On Tue, Feb 18, 2020 at 05:19:27PM +0530, Selvin Xavier wrote:
> On Sat, Jan 4, 2020 at 1:15 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> >
> > On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote:
> >
> > >  static void __exit bnxt_re_mod_exit(void)
> > >  {
> > > -     struct bnxt_re_dev *rdev, *next;
> > > -     LIST_HEAD(to_be_deleted);
> > > +     struct bnxt_re_dev *rdev;
> > >
> > > +     flush_workqueue(bnxt_re_wq);
> > >       mutex_lock(&bnxt_re_dev_lock);
> > > -     /* Free all adapter allocated resources */
> > > -     if (!list_empty(&bnxt_re_dev_list))
> > > -             list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
> > > -     mutex_unlock(&bnxt_re_dev_lock);
> > > -       /*
> > > -     * Cleanup the devices in reverse order so that the VF device
> > > -     * cleanup is done before PF cleanup
> > > -     */
> > > -     list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
> > > -             dev_info(rdev_to_dev(rdev), "Unregistering Device");
> > > +     list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
> > >               /*
> > > -              * Flush out any scheduled tasks before destroying the
> > > -              * resources
> > > +              * Set unreg flag to avoid VF resource cleanup
> > > +              * in module unload path. This is required because
> > > +              * dealloc_driver for VF can come after PF cleaning
> > > +              * the VF resources.
> > >                */
> > > -             flush_workqueue(bnxt_re_wq);
> > > -             bnxt_re_dev_stop(rdev);
> > > -             bnxt_re_ib_uninit(rdev);
> > > -             /* Acquire the rtnl_lock as the L2 resources are freed here */
> > > -             rtnl_lock();
> > > -             bnxt_re_remove_device(rdev);
> > > -             rtnl_unlock();
> > > +             if (rdev->is_virtfn)
> > > +                     rdev->rcfw.res_deinit = true;
> > >       }
> > > +     mutex_unlock(&bnxt_re_dev_lock);
> >
> > This is super ugly. This driver already has bugs if it has a
> > dependency on driver unbinding order as drivers can become unbound
> > from userspace using sysfs or hot un-plug in any ordering.
> >
> The dependency is from the HW side and not from the driver side.
> In some of the HW versions, RoCE PF driver is allowed to allocate the
> host memory
> for VFs and this dependency is due to this.
> > If the VF driver somehow depends on the PF driver then destruction of
> > the PF must synchronize and fence the VFs during it's own shutdown.
>
> Do you suggest that synchronization should be done in the stack before
> invoking the
> dealloc_driver for VF?

'in the stack'? This is a driver problem.. You can't assume ordering
of driver detaches in Linux, and drivers should really not be
co-ordinating across their instances.

> > But this seems very strange, how can it work if the VF is in a VM
> > or something and the PF driver is unplugged?

> This code is not handling the case where the VF is attached to a VM.
> First command to HW after the removal of PF will fail with timeout.
> Driver will stop sending commands to HW once it sees this timeout. VF
> driver removal
> will proceed with cleaning up of host resources without sending any
> commands to FW
> and exit the removal process.

So why not use this for the host case as well? The timeout is too
long?

> On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this
> patch, when VF removal is initiated,
> the first command will timeout and driver will stop sending any more commands
> to FW and proceed with removal. All VFs will exit in the same way.
> Just that each
> function will wait for one command to timeout. Setting
> rdev->rcfw.res_deinit was added
> as a hack to avoid this  waiting time.

The issue is that pci_driver_unregister undoes the driver binds in
FIFO not LIFO order?

What happens when the VF binds after the PF?

Jason