Re: [PATCH for-next 5/6] RDMA/bnxt_re: Use driver_unregister and unregistration API

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Feb 19, 2020 at 5:45 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>
> On Tue, Feb 18, 2020 at 05:19:27PM +0530, Selvin Xavier wrote:
> > On Sat, Jan 4, 2020 at 1:15 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote:
> > >
> > > On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote:
> > >
> > > >  static void __exit bnxt_re_mod_exit(void)
> > > >  {
> > > > -     struct bnxt_re_dev *rdev, *next;
> > > > -     LIST_HEAD(to_be_deleted);
> > > > +     struct bnxt_re_dev *rdev;
> > > >
> > > > +     flush_workqueue(bnxt_re_wq);
> > > >       mutex_lock(&bnxt_re_dev_lock);
> > > > -     /* Free all adapter allocated resources */
> > > > -     if (!list_empty(&bnxt_re_dev_list))
> > > > -             list_splice_init(&bnxt_re_dev_list, &to_be_deleted);
> > > > -     mutex_unlock(&bnxt_re_dev_lock);
> > > > -       /*
> > > > -     * Cleanup the devices in reverse order so that the VF device
> > > > -     * cleanup is done before PF cleanup
> > > > -     */
> > > > -     list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) {
> > > > -             dev_info(rdev_to_dev(rdev), "Unregistering Device");
> > > > +     list_for_each_entry(rdev, &bnxt_re_dev_list, list) {
> > > >               /*
> > > > -              * Flush out any scheduled tasks before destroying the
> > > > -              * resources
> > > > +              * Set unreg flag to avoid VF resource cleanup
> > > > +              * in module unload path. This is required because
> > > > +              * dealloc_driver for VF can come after PF cleaning
> > > > +              * the VF resources.
> > > >                */
> > > > -             flush_workqueue(bnxt_re_wq);
> > > > -             bnxt_re_dev_stop(rdev);
> > > > -             bnxt_re_ib_uninit(rdev);
> > > > -             /* Acquire the rtnl_lock as the L2 resources are freed here */
> > > > -             rtnl_lock();
> > > > -             bnxt_re_remove_device(rdev);
> > > > -             rtnl_unlock();
> > > > +             if (rdev->is_virtfn)
> > > > +                     rdev->rcfw.res_deinit = true;
> > > >       }
> > > > +     mutex_unlock(&bnxt_re_dev_lock);
> > >
> > > This is super ugly. This driver already has bugs if it has a
> > > dependency on driver unbinding order as drivers can become unbound
> > > from userspace using sysfs or hot un-plug in any ordering.
> > >
> > The dependency is from the HW side and not from the driver side.
> > In some of the HW versions, RoCE PF driver is allowed to allocate the
> > host memory
> > for VFs and this dependency is due to this.
> > > If the VF driver somehow depends on the PF driver then destruction of
> > > the PF must synchronize and fence the VFs during it's own shutdown.
> >
> > Do you suggest that synchronization should be done in the stack before
> > invoking the
> > dealloc_driver for VF?
>
> 'in the stack'? This is a driver problem.. You can't assume ordering
> of driver detaches in Linux, and drivers should really not be
> co-ordinating across their instances.
>
> > > But this seems very strange, how can it work if the VF is in a VM
> > > or something and the PF driver is unplugged?
>
> > This code is not handling the case where the VF is attached to a VM.
> > First command to HW after the removal of PF will fail with timeout.
> > Driver will stop sending commands to HW once it sees this timeout. VF
> > driver removal
> > will proceed with cleaning up of host resources without sending any
> > commands to FW
> > and exit the removal process.
>
> So why not use this for the host case as well? The timeout is too
> long?
Yeah. Timeout for the first command is 20sec now. May be, I can use
a smaller timeout in the unreg path.
>
> > On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this
> > patch, when VF removal is initiated,
> > the first command will timeout and driver will stop sending any more commands
> > to FW and proceed with removal. All VFs will exit in the same way.
> > Just that each
> > function will wait for one command to timeout. Setting
> > rdev->rcfw.res_deinit was added
> > as a hack to avoid this  waiting time.
>
> The issue is that pci_driver_unregister undoes the driver binds in
> FIFO not LIFO order?
>
> What happens when the VF binds after the PF?

This is not dependent on PCI driver unregister. This particular issue
is happening when bnxt_re
driver only  is unloaded and the new  ib_unregister_driver is invoked
by bnxt_re driver in the mod_exit hook.
dealloc_driver for each IB device  is called mostly in FIFO
order(using xa_for_each).  So since PF ib device was added first, it
gets removed and then VF is removed.

After this discussion, now I feel, it's better to remove the hack and
allow the commands to fail with timeout and exit.
The issue is seen only when users try to unload bnxt_re alone, without
destroying the VFs.The chances of this type of usage is slim.
Anyway, it's not a Fatal error if any of the users try this sequence.
Just that the rmmod will take some time to exit.

Shall i repost by removing this hack?
>
> Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux