On Wed, Feb 19, 2020 at 5:45 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > On Tue, Feb 18, 2020 at 05:19:27PM +0530, Selvin Xavier wrote: > > On Sat, Jan 4, 2020 at 1:15 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > > > On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote: > > > > > > > static void __exit bnxt_re_mod_exit(void) > > > > { > > > > - struct bnxt_re_dev *rdev, *next; > > > > - LIST_HEAD(to_be_deleted); > > > > + struct bnxt_re_dev *rdev; > > > > > > > > + flush_workqueue(bnxt_re_wq); > > > > mutex_lock(&bnxt_re_dev_lock); > > > > - /* Free all adapter allocated resources */ > > > > - if (!list_empty(&bnxt_re_dev_list)) > > > > - list_splice_init(&bnxt_re_dev_list, &to_be_deleted); > > > > - mutex_unlock(&bnxt_re_dev_lock); > > > > - /* > > > > - * Cleanup the devices in reverse order so that the VF device > > > > - * cleanup is done before PF cleanup > > > > - */ > > > > - list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) { > > > > - dev_info(rdev_to_dev(rdev), "Unregistering Device"); > > > > + list_for_each_entry(rdev, &bnxt_re_dev_list, list) { > > > > /* > > > > - * Flush out any scheduled tasks before destroying the > > > > - * resources > > > > + * Set unreg flag to avoid VF resource cleanup > > > > + * in module unload path. This is required because > > > > + * dealloc_driver for VF can come after PF cleaning > > > > + * the VF resources. > > > > */ > > > > - flush_workqueue(bnxt_re_wq); > > > > - bnxt_re_dev_stop(rdev); > > > > - bnxt_re_ib_uninit(rdev); > > > > - /* Acquire the rtnl_lock as the L2 resources are freed here */ > > > > - rtnl_lock(); > > > > - bnxt_re_remove_device(rdev); > > > > - rtnl_unlock(); > > > > + if (rdev->is_virtfn) > > > > + rdev->rcfw.res_deinit = true; > > > > } > > > > + mutex_unlock(&bnxt_re_dev_lock); > > > > > > This is super ugly. This driver already has bugs if it has a > > > dependency on driver unbinding order as drivers can become unbound > > > from userspace using sysfs or hot un-plug in any ordering. > > > > > The dependency is from the HW side and not from the driver side. > > In some of the HW versions, RoCE PF driver is allowed to allocate the > > host memory > > for VFs and this dependency is due to this. > > > If the VF driver somehow depends on the PF driver then destruction of > > > the PF must synchronize and fence the VFs during it's own shutdown. > > > > Do you suggest that synchronization should be done in the stack before > > invoking the > > dealloc_driver for VF? > > 'in the stack'? This is a driver problem.. You can't assume ordering > of driver detaches in Linux, and drivers should really not be > co-ordinating across their instances. > > > > But this seems very strange, how can it work if the VF is in a VM > > > or something and the PF driver is unplugged? > > > This code is not handling the case where the VF is attached to a VM. > > First command to HW after the removal of PF will fail with timeout. > > Driver will stop sending commands to HW once it sees this timeout. VF > > driver removal > > will proceed with cleaning up of host resources without sending any > > commands to FW > > and exit the removal process. > > So why not use this for the host case as well? The timeout is too > long? Yeah. Timeout for the first command is 20sec now. May be, I can use a smaller timeout in the unreg path. > > > On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this > > patch, when VF removal is initiated, > > the first command will timeout and driver will stop sending any more commands > > to FW and proceed with removal. All VFs will exit in the same way. > > Just that each > > function will wait for one command to timeout. Setting > > rdev->rcfw.res_deinit was added > > as a hack to avoid this waiting time. > > The issue is that pci_driver_unregister undoes the driver binds in > FIFO not LIFO order? > > What happens when the VF binds after the PF? This is not dependent on PCI driver unregister. This particular issue is happening when bnxt_re driver only is unloaded and the new ib_unregister_driver is invoked by bnxt_re driver in the mod_exit hook. dealloc_driver for each IB device is called mostly in FIFO order(using xa_for_each). So since PF ib device was added first, it gets removed and then VF is removed. After this discussion, now I feel, it's better to remove the hack and allow the commands to fail with timeout and exit. The issue is seen only when users try to unload bnxt_re alone, without destroying the VFs.The chances of this type of usage is slim. Anyway, it's not a Fatal error if any of the users try this sequence. Just that the rmmod will take some time to exit. Shall i repost by removing this hack? > > Jason