> From: Long Li <longli@xxxxxxxxxxxxx> > Sent: Friday, April 23, 2021 11:40 AM > To: Dexuan Cui <decui@xxxxxxxxxxxxx>; longli@xxxxxxxxxxxxxxxxx; KY > > > Subject: RE: [Patch v2 2/2] PCI: hv: Remove unused refcount and supporting > > functions for handling bus device removal > > > > > From: longli@xxxxxxxxxxxxxxxxx <longli@xxxxxxxxxxxxxxxxx> > > > Sent: Wednesday, April 21, 2021 10:46 PM > > > > > > With the new method of flushing/stopping the workqueue before doing > > > bus removal, the old mechanisum of using refcount and wait for > > > completion > > > > mechanisum -> mechanism > > > > > is no longer needed. Remove those dead code. > > > > > > Signed-off-by: Long Li <longli@xxxxxxxxxxxxx> > > > > The patch looks good to me. BTW, can we also remove get_pcichild() and > > put_pcichild() in an extra patch? I suspect we don't really need those either. > > Those two functions are for protecting accessing to the devices on the hbus. > There are interactions from PCI layer that need guarantee from hbus that the > device is present at the time of access. > > Why do you think we don't' need those? IMO there is proper locking and synchronization logic in the PCI layer, so I don't think the 'hpdev' struct can vanish when it's being accessed from the PCI layer. I think the 'hpdev' struct can only be freed in two scenarios: 1) the PCI device is removed: the VM receives the PCI_EJECT message, hv_eject_device_work() calls pci_stop_and_remove_bus_device() to deregister the pci_dev from the PCI layer, and then does other cleanup, and finally call kfree(hpdev) in the third put_pcichild() in hv_eject_device_work(). 2) the pci-hyperv driver is unloaded: in this case, hv_pci_remove() calls pci_remove_root_bus() to deregister the pci_dev, and then calls hv_pci_bus_exit() -> hv_pci_start_relations_work(), and eventually pci_devices_present_work() decreases the ref counter to zero and free the 'hpdev'. In both the case, when the hpdev or the pdev is still being used by the PCI layer, I think the pci_stop_and_remove_bus_device() and pci_remove_root_bus() should be blocked, i.e. the hpdev can't be freed even if we don't have the ref counter mechanism. For example. we know the 'lspci' program can read the PCI device's config space directly via the sysfile /sys/bus/pci/devices/XXXX/config; when 'lspci' is reading the config space, the code path can be: do_syscall_64 ksys_pread64 vfs_read new_sync_read kernfs_fop_read_iter kernfs_file_read_iter kernfs_get_active -> atomic_inc_unless_negative(&kn->active). pci_read_config pci_user_read_config_dword hv_pcifront_read_config _hv_pcifront_read_config At this moment, if the host tries to remove the PCI device, the PCI_EJECT code path will be blocked due to the kn->active: hv_eject_device_work() pci_stop_and_remove_bus_device() pci_stop_bus_device pci_remove_sysfs_dev_files kernfs_remove_by_name_ns __kernfs_remove kernfs_drain wait_event(root->deactivate_waitq, atomic_read(&kn->active) == KN_DEACTIVATED_BIAS); I don't check all the scenarios and code paths, but generally speaking I suppose the PCI subsystem should already have the proper locking and synchronization logic we need here. PS, I'm not sure if the host can remove the device by only sending a PCI_BUS_RELATIONS message with bus_rel->device_count == 0 (i.e. no PCI_EJECT message): in this case, if we don't have the ref counter for hpdev, pci_devices_present_work() frees the hpdev before calling pci_scan_child_bus() and this can be a potential race condition. But IMO this can be easily fixed by moving "free the hpdev" to a later place, afer pci_scan_child_bus() is called. Thanks, -- Dexuan