On Sat, Feb 18, 2023 at 09:22:37PM +0800, Yang Su wrote: > I figue out the reason of pci_bridge_secondary_bus_reset() why not work > for NVIDIA GPU T4 which bind vfio passthrough hypervisor. I used the > original func pci_bridge_secondary_bus_reset() not your patch, your > patch remove bridge_d3 flag, the real reason is bridge_d3 flag. > > When I test the original func pci_bridge_secondary_bus_reset() in > different machine node which all consist of the same type NVIDIA GPU T4, > I found pci_bridge_wait_for_secondary_bus() bails out if the bridge_d3 > flag is not set, but I still confused why same gpu some machine node > not set the bridge_d3 flag. > > I find the linux kernel only two func will init bridge_d3 which is func > pci_pm_init() and pci_bridge_d3_update(). > > If you know, please give me some hint. It sounds like patch [1/3] in this series fixes the issue your seeing. That's good to hear. The bridge_d3 flag indicates whether a PCIe port is allowed to runtime suspend to D3: First of all, pci_bridge_d3_possible() decides whether the PCIe port may runtime suspend at all. E.g. hotplug ports are generally not allowed to runtime suspend except in certain known-to-work situations, such as Thunderbolt or when specific ACPI properties are present. Second, a device below the PCIe port may block the port from runtime suspending to D3. That is decided in pci_dev_check_d3cold(). E.g. if user space disallows D3 via the "d3cold_allowed" attribute in sysfs, that will block D3 on PCIe ports in the ancestry. If you're seeing different values for bridge_d3 on different machines, even though the device below the PCIe port is always the same type of GPU, then either pci_bridge_d3_possible() returns a different result for the PCIe port in question (because it's a hotplug port or has different ACPI properties etc), or one of its children blocks runtime suspend to D3 (e.g. via sysfs). Thanks, Lukas