From: Long Li <longli@xxxxxxxxxxxxx> Sent: Wednesday, April 21, 2021 12:57 PM > > From: longli@xxxxxxxxxxxxxxxxx <longli@xxxxxxxxxxxxxxxxx> Sent: > > Monday, April 19, 2021 12:21 PM > > > > > > On removing the device, any work item (hv_pci_devices_present() or > > > hv_pci_eject_device()) scheduled on workqueue hbus->wq may still be > > > running and race with hv_pci_remove(). > > > > > > This can happen because the host may send PCI_EJECT or > > > PCI_BUS_RELATIONS(2) and decide to rescind the channel immediately > > after that. > > > > > > Fix this by flushing/stopping the workqueue of hbus before doing hbus > > remove. > > > > I can see that this change follows the same pattern as in hv_pci_suspend(). > > The comments there give a full explanation of the issue and the solution. But > > interestingly, the current code also has a reference count mechanism on the > > hbus. And code near the end of hv_pci_remove() decrements the reference > > count and then waits for all users to finish before destroying the workqueue. > > With this change, is this reference counting mechanism still needed? If the > > workqueue has already been emptied, it seems like the > > wait_for_completion() near the end of hv_pci_remove() would never be > > waiting for anything. It makes me wonder if moving the reference count > > checking code from near the end of hv_pci_remove() up to near the beginning > > would solve the problem as well (and maybe in hv_pci_suspend also?). > > Yes I think put_hvpcibus() and get_hvpcibus() can be removed, as we have changed to use > a dedicated workqueue for hbus since they were introduced. > > But we still need to call tasklet_disable/enable() the same way hv_pci_suspend() does, the > reason is that we need to protect hbus->state. This value needs to be consistent for the > driver. For example, a CPU may decide to schedule a work on a work queue that we just > flushed or destroyed, by reading the wrong hbus->state. > Yes, I would agree the tasklet disable/enable are needed, especially since tasklet_disable() is what ensures that the tasklet is not currently running. If the hbus ref counting isn't needed any longer, I would strongly recommend adding a patch to the series that removes it. This synchronization stuff is hard enough to understand and reason about; having a leftover mechanism that doesn't really do anything useful makes it nearly impossible. :-) Dexuan -- I'm hoping you can take a look as well and see if you agree. Michael > > > > Michael > > > > > > > > Signed-off-by: Long Li <longli@xxxxxxxxxxxxx> > > > --- > > > drivers/pci/controller/pci-hyperv.c | 11 +++++++++++ > > > 1 file changed, 11 insertions(+) > > > > > > diff --git a/drivers/pci/controller/pci-hyperv.c > > > b/drivers/pci/controller/pci-hyperv.c > > > index 27a17a1e4a7c..116815404313 100644 > > > --- a/drivers/pci/controller/pci-hyperv.c > > > +++ b/drivers/pci/controller/pci-hyperv.c > > > @@ -3305,6 +3305,17 @@ static int hv_pci_remove(struct hv_device > > > *hdev) > > > > > > hbus = hv_get_drvdata(hdev); > > > if (hbus->state == hv_pcibus_installed) { > > > + tasklet_disable(&hdev->channel->callback_event); > > > + hbus->state = hv_pcibus_removing; > > > + tasklet_enable(&hdev->channel->callback_event); > > > + > > > + flush_workqueue(hbus->wq); > > > + /* > > > + * At this point, no work is running or can be scheduled > > > + * on hbus-wq. We can't race with hv_pci_devices_present() > > > + * or hv_pci_eject_device(), it's safe to proceed. > > > + */ > > > + > > > /* Remove the bus from PCI's point of view. */ > > > pci_lock_rescan_remove(); > > > pci_stop_root_bus(hbus->pci_bus); > > > -- > > > 2.27.0