Re: [PATCH 3/3] PCI/DPC: Await readiness of secondary bus after reset

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Jan 12, 2023 at 8:39 PM Yang Su <yang.su@xxxxxxxxxxxxxxxxx> wrote:
>
> Hi Bjorn,
>
> This email is I discussed with Lukas previously, but now I found I forget to CC you.
>
> I also test pci_bridge_wait_for_secondary_bus in NVIDIA GPU T4 which bind vfio
>
> passthrough in VMM, I found the pci_bridge_wait_for_secondary_bus can not wait
>
> the enough time as pci spec requires, the reasons are described as below.

Hi Yang,

When you're discussing Linux patches, please always keep the cc list
so the rest of us can see what's happening.  Also please use plain
ASCII email (not HTML, etc) because the list archives often discard
fancy emails.  Hints:

http://vger.kernel.org/majordomo-info.html
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
https://people.kernel.org/tglx/notes-about-netiquette

I'll wait for you and Lukas to continue this conversation on the mailing list.

Bjorn

> -------- 转发的消息 --------
> 主题: Re: [PATCH 2/3] PCI: Unify delay handling for reset and resume
> 日期: Wed, 4 Jan 2023 12:46:17 +0800
> From: Yang Su <yang.su@xxxxxxxxxxxxxxxxx>
> 收件人: Lukas Wunner <lukas@xxxxxxxxx>
>
>
> Hi Lukas,
>
> When some device such as NVIDIA GPU T4 bind vfio to passthrough in VMM, the vfio will do reset for the divice, but some device
>
> such as NVIDIA GPU T4 only support secondary bus reset (SBR), and then will call pci_reset_secondary_bus.
>
> The call path is below as figure shows,
>
>
> There are there reasons I think your modify for pci_reset_secondary_bus is not ok.
>
> 1. As I describe for you, if the device not bridge call pci_reset_secondary_bus,
>
> the device will enter pci_bridge_wait_for_secondary_bus, but in pci_bridge_wait_for_secondary_bus
>
> the device is not judged as pci bridge, will directly return.
>
> if (!pci_is_bridge(dev) || !dev->bridge_d3) {
>
> above code will do the judge.
>
>
> 2. In pci_bridge_wait_for_secondary_bus, pci_bus_max_d3cold_delay will take count of wrong time delay,
>
> such as NVIDIA GPU T4 is not pci bridge, so the subordinate is none, pci_bus_max_d3cold_delay
>
> set the min_delay is 100, max_delay is 0, here is the bug, after list_for_each_entry() in pci_bus_max_d3cold_delay,
>
> the min_delay will be 0, the max_delay also 0, the pci_bus_max_d3cold_delay return is surely 0.
>
>
> 3. pci_dev_wait must be saved, and cannot be replaced as pci_bridge_wait_for_secondary_bus.
>
> Because the pcie spec requires the device to check CRS, and pci_dev_wait do this thing.
>
> "
>
> Note: Software should use 100 ms wait periods only if software enables CRS Software Visibility.
>
> Otherwise, Completion timeouts, platform timeouts, or lengthy processor instruction stalls may result.
>
> See the Configuration Request Retry Status Implementation Note in Section 2.3.1.
>
> "
>
>
> void pci_bridge_wait_for_secondary_bus(struct pci_dev *dev)
> {
>     struct pci_dev *child;
>     int delay;
>
>     if (pci_dev_is_disconnected(dev))
>         return;
>
>     if (!pci_is_bridge(dev) || !dev->bridge_d3)
>         return;
>
> ...
>
>     /* Take d3cold_delay requirements into account */
>     delay = pci_bus_max_d3cold_delay(dev->subordinate);
>
> }
>
>
>   void pci_reset_secondary_bus(struct pci_dev *dev)
> @@ -5058,15 +5060,6 @@ void pci_reset_secondary_bus(struct pci_dev *dev)
>
>   ctrl &= ~PCI_BRIDGE_CTL_BUS_RESET;
>   pci_write_config_word(dev, PCI_BRIDGE_CONTROL, ctrl);
> -
> - /*
> - * Trhfa for conventional PCI is 2^25 clock cycles.
> - * Assuming a minimum 33MHz clock this results in a 1s
> - * delay before we can consider subordinate devices to
> - * be re-initialized.  PCIe has some ways to shorten this,
> - * but we don't make use of them yet.
> - */
> - ssleep(1);
>   }
>
>   void __weak pcibios_reset_secondary_bus(struct pci_dev *dev)
> @@ -5085,7 +5078,8 @@ int pci_bridge_secondary_bus_reset(struct pci_dev *dev)
>   {
>   pcibios_reset_secondary_bus(dev);
>
> - return pci_dev_wait(dev, "bus reset", PCIE_RESET_READY_POLL_MS);
> + return pci_bridge_wait_for_secondary_bus(dev, "bus reset",
> + PCIE_RESET_READY_POLL_MS);
>   }
>
>
> 在 2023/1/4 01:25, Lukas Wunner 写道:
>
> Hi Yang Su!
>
> Thanks for reaching out.
>
> On Tue, Jan 03, 2023 at 07:45:25PM +0800, Yang Su wrote:
>
> I think your modify in pci_reset_secondary_bus is not ok. I test pci_bridge_secondary_bus_reset used in pci_reset_secondary_bus,
> there will be an error is a device is not pci bridge device in pci_reset_secondary_buscall pci_bridge_secondary_bus_reset,
> pci_bridge_secondary_bus_reset judge the device is not pci bridge device and just return and will do nothing.
> So just use pci_bridge_secondary_bus_reset to replace ssleep(1) is not ok.
>
> Hm, I don't quite understand.
>
> How do you invoke pci_bridge_secondary_bus_reset() with a non-bridge device?
> I followed all the code paths leading to pci_bridge_secondary_bus_reset()
> and it seems to me none of them is calling the function with an endpoint
> device.  What am I missing?
>
> Thanks,
>
> Lukas
>
> 在 2023/1/13 06:35, Bjorn Helgaas 写道:
>
> On Sat, Dec 31, 2022 at 07:33:39PM +0100, Lukas Wunner wrote:
>
> We're calling pci_bridge_wait_for_secondary_bus() after performing a
> Secondary Bus Reset, but neglect to do the same after coming out of a
> DPC-induced Hot Reset.  As a result, we're not observing the delays
> prescribed by PCIe r6.0 sec 6.6.1 and may access devices on the
> secondary bus before they're ready.  Fix it.
>
> Tested-by: Ravi Kishore Koppuravuri <ravi.kishore.koppuravuri@xxxxxxxxx>
>
> I assume this patch is the one that makes the difference for the
> Intel Ponte Vecchio HPC GPU?  Is there a URL to a problem report, or
> at least a sentence or two we can include here to connect the patch
> with the problem users may see?  Most people won't know how to
> recognize accesses to devices on the secondary bus before they're
> ready.
>
> Bjorn




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux