Re: [PATCH v13 1/3] PCI: make pci_dev_is_disconnected() helper public for other drivers

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 3/1/2024 6:33 AM, Bjorn Helgaas wrote:
On Thu, Feb 29, 2024 at 09:58:43AM +0800, Ethan Zhao wrote:
On 2/27/2024 7:05 AM, Bjorn Helgaas wrote:
On Thu, Feb 22, 2024 at 08:54:54PM +0800, Baolu Lu wrote:
On 2024/2/22 17:02, Ethan Zhao wrote:
Make pci_dev_is_disconnected() public so that it can be called from
Intel VT-d driver to quickly fix/workaround the surprise removal
unplug hang issue for those ATS capable devices on PCIe switch downstream
hotplug capable ports.

Beside pci_device_is_present() function, this one has no config space
space access, so is light enough to optimize the normal pure surprise
removal and safe removal flow.

Tested-by: Haorong Ye<yehaorong@xxxxxxxxxxxxx>
Signed-off-by: Ethan Zhao<haifeng.zhao@xxxxxxxxxxxxxxx>
---
    drivers/pci/pci.h   | 5 -----
    include/linux/pci.h | 5 +++++
    2 files changed, 5 insertions(+), 5 deletions(-)
Hi PCI subsystem maintainers,

The iommu drivers (including, but not limited to, the Intel VT-d driver)
require a helper to check the physical presence of a PCI device in two
scenarios:

- During the iommu_release_device() path: This ensures the device is
    physically present before sending device TLB invalidation to device.
This wording is fundamentally wrong.  Testing
pci_dev_is_disconnected() can never ensure the device will still be
present by the time a TLB invalidation is sent.
The logic of testing pci_dev_is_disconnected() in patch [2/3] works
in the opposite:

1. if pci_dev_is_disconnected() return true, means the device is in
    the process of surprise removal handling, adapter already been
    removed from the slot.

2. for removed device, no need to send ATS invalidation request to it.
    removed device lost power, its devTLB wouldn't be valid anymore.

3. if pci_dev_is_disconnected() return false, the device is *likely*
    to be removed at any momoment after this function called.
    such case will be treated in the iommu ITE fault handling, not to
    retry the timeout request if device isn't present (patch [3/3]).

The device may be removed after the pci_dev_is_disconnected() test and
before a TLB invalidate is sent.
even in the process while TLB is invalidating.

This is why I hesitate to expose pci_dev_is_disconnected() (and
pci_device_is_present(), which we already export) outside
drivers/pci/.  They both lead to terrible mistakes like relying on the
false assumption that the result will remain valid after the functions
return, without any recognition that we MUST be able to deal with the
cases where that assumption is broken.
Yup, your concern is worthy ,but isn't happening within this patchset.
OK, I acked the patch.

Great !


I guess my complaint is really with pci_device_is_present() because
that's even harder to use correctly.

pci_device_is_present():
   slow (may do config access to device)
   true  => device *was* present in the recent past, may not be now
   false => device is not accessible

so the 'false' result is reliable for post-calling code, then give up
more attempt of the same request. the usage in patch[3/3]


pci_dev_is_disconnected():
   fast (doesn't touch device)
   true  => device is not accessible

also we are relying on the 'true' result returned, used in  patch[2/3].

   false => basically means nothing

I guess they're both hard ;)

seems I didn't mess them up. :)

Thanks,
Ethan


Bjorn




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux