On Tue, Feb 18, 2020 at 05:19:27PM +0530, Selvin Xavier wrote: > On Sat, Jan 4, 2020 at 1:15 AM Jason Gunthorpe <jgg@xxxxxxxx> wrote: > > > > On Mon, Nov 25, 2019 at 12:39:33AM -0800, Selvin Xavier wrote: > > > > > static void __exit bnxt_re_mod_exit(void) > > > { > > > - struct bnxt_re_dev *rdev, *next; > > > - LIST_HEAD(to_be_deleted); > > > + struct bnxt_re_dev *rdev; > > > > > > + flush_workqueue(bnxt_re_wq); > > > mutex_lock(&bnxt_re_dev_lock); > > > - /* Free all adapter allocated resources */ > > > - if (!list_empty(&bnxt_re_dev_list)) > > > - list_splice_init(&bnxt_re_dev_list, &to_be_deleted); > > > - mutex_unlock(&bnxt_re_dev_lock); > > > - /* > > > - * Cleanup the devices in reverse order so that the VF device > > > - * cleanup is done before PF cleanup > > > - */ > > > - list_for_each_entry_safe_reverse(rdev, next, &to_be_deleted, list) { > > > - dev_info(rdev_to_dev(rdev), "Unregistering Device"); > > > + list_for_each_entry(rdev, &bnxt_re_dev_list, list) { > > > /* > > > - * Flush out any scheduled tasks before destroying the > > > - * resources > > > + * Set unreg flag to avoid VF resource cleanup > > > + * in module unload path. This is required because > > > + * dealloc_driver for VF can come after PF cleaning > > > + * the VF resources. > > > */ > > > - flush_workqueue(bnxt_re_wq); > > > - bnxt_re_dev_stop(rdev); > > > - bnxt_re_ib_uninit(rdev); > > > - /* Acquire the rtnl_lock as the L2 resources are freed here */ > > > - rtnl_lock(); > > > - bnxt_re_remove_device(rdev); > > > - rtnl_unlock(); > > > + if (rdev->is_virtfn) > > > + rdev->rcfw.res_deinit = true; > > > } > > > + mutex_unlock(&bnxt_re_dev_lock); > > > > This is super ugly. This driver already has bugs if it has a > > dependency on driver unbinding order as drivers can become unbound > > from userspace using sysfs or hot un-plug in any ordering. > > > The dependency is from the HW side and not from the driver side. > In some of the HW versions, RoCE PF driver is allowed to allocate the > host memory > for VFs and this dependency is due to this. > > If the VF driver somehow depends on the PF driver then destruction of > > the PF must synchronize and fence the VFs during it's own shutdown. > > Do you suggest that synchronization should be done in the stack before > invoking the > dealloc_driver for VF? 'in the stack'? This is a driver problem.. You can't assume ordering of driver detaches in Linux, and drivers should really not be co-ordinating across their instances. > > But this seems very strange, how can it work if the VF is in a VM > > or something and the PF driver is unplugged? > This code is not handling the case where the VF is attached to a VM. > First command to HW after the removal of PF will fail with timeout. > Driver will stop sending commands to HW once it sees this timeout. VF > driver removal > will proceed with cleaning up of host resources without sending any > commands to FW > and exit the removal process. So why not use this for the host case as well? The timeout is too long? > On hypervisor, if we don't set rdev->rcfw.res_deinit as done in this > patch, when VF removal is initiated, > the first command will timeout and driver will stop sending any more commands > to FW and proceed with removal. All VFs will exit in the same way. > Just that each > function will wait for one command to timeout. Setting > rdev->rcfw.res_deinit was added > as a hack to avoid this waiting time. The issue is that pci_driver_unregister undoes the driver binds in FIFO not LIFO order? What happens when the VF binds after the PF? Jason