On Thu, Feb 29, 2024 at 09:58:43AM +0800, Ethan Zhao wrote: > On 2/27/2024 7:05 AM, Bjorn Helgaas wrote: > > On Thu, Feb 22, 2024 at 08:54:54PM +0800, Baolu Lu wrote: > > > On 2024/2/22 17:02, Ethan Zhao wrote: > > > > Make pci_dev_is_disconnected() public so that it can be called from > > > > Intel VT-d driver to quickly fix/workaround the surprise removal > > > > unplug hang issue for those ATS capable devices on PCIe switch downstream > > > > hotplug capable ports. > > > > > > > > Beside pci_device_is_present() function, this one has no config space > > > > space access, so is light enough to optimize the normal pure surprise > > > > removal and safe removal flow. > > > > > > > > Tested-by: Haorong Ye<yehaorong@xxxxxxxxxxxxx> > > > > Signed-off-by: Ethan Zhao<haifeng.zhao@xxxxxxxxxxxxxxx> > > > > --- > > > > drivers/pci/pci.h | 5 ----- > > > > include/linux/pci.h | 5 +++++ > > > > 2 files changed, 5 insertions(+), 5 deletions(-) > > > Hi PCI subsystem maintainers, > > > > > > The iommu drivers (including, but not limited to, the Intel VT-d driver) > > > require a helper to check the physical presence of a PCI device in two > > > scenarios: > > > > > > - During the iommu_release_device() path: This ensures the device is > > > physically present before sending device TLB invalidation to device. > > This wording is fundamentally wrong. Testing > > pci_dev_is_disconnected() can never ensure the device will still be > > present by the time a TLB invalidation is sent. > > The logic of testing pci_dev_is_disconnected() in patch [2/3] works > in the opposite: > > 1. if pci_dev_is_disconnected() return true, means the device is in > the process of surprise removal handling, adapter already been > removed from the slot. > > 2. for removed device, no need to send ATS invalidation request to it. > removed device lost power, its devTLB wouldn't be valid anymore. > > 3. if pci_dev_is_disconnected() return false, the device is *likely* > to be removed at any momoment after this function called. > such case will be treated in the iommu ITE fault handling, not to > retry the timeout request if device isn't present (patch [3/3]). > > > The device may be removed after the pci_dev_is_disconnected() test and > > before a TLB invalidate is sent. > > even in the process while TLB is invalidating. > > > This is why I hesitate to expose pci_dev_is_disconnected() (and > > pci_device_is_present(), which we already export) outside > > drivers/pci/. They both lead to terrible mistakes like relying on the > > false assumption that the result will remain valid after the functions > > return, without any recognition that we MUST be able to deal with the > > cases where that assumption is broken. > > Yup, your concern is worthy ,but isn't happening within this patchset. OK, I acked the patch. I guess my complaint is really with pci_device_is_present() because that's even harder to use correctly. pci_device_is_present(): slow (may do config access to device) true => device *was* present in the recent past, may not be now false => device is not accessible pci_dev_is_disconnected(): fast (doesn't touch device) true => device is not accessible false => basically means nothing I guess they're both hard ;) Bjorn