ohh actually, I was testing with a kernel without this workaround applied, so I need to retest it later. On Wed, Aug 29, 2018 at 2:40 PM, Karol Herbst <kherbst@xxxxxxxxxx> wrote: > On Tue, Aug 28, 2018 at 4:23 AM, Daniel Drake <drake@xxxxxxxxxxxx> wrote: >> On Fri, Aug 24, 2018 at 11:42 PM, Peter Wu <peter@xxxxxxxxxxxxx> wrote: >>> Are these systems also affected through runtime power management? For >>> example: >>> >>> modprobe nouveau # should enable runtime PM >>> sleep 6 # wait for runtime suspend to kick in >>> lspci -s1: # runtime resume by reading PCI config space >>> >>> On laptops from about 2015-2016 with a GTX 9xxM this sequence results in >>> hangs on various laptops >>> (https://bugzilla.kernel.org/show_bug.cgi?id=156341). >> >> This works fine here. I'm facing a different issue. >> >>>> After a lot of experimentation I found a workaround: during resume, >>>> set the value of PCI_PREF_BASE_UPPER32 to 0 on the parent PCI bridge. >>>> Easily done in drivers/pci/quirks.c. Now all nvidia stuff works fine. >>> >>> I am curious, how did you discover this? While this could work, perhaps >>> there are alternative workarounds/fixes? >> >> Based on the observation that the following procedure works fine (note >> the addition of step 3): >> >> 1. Boot >> 2. Suspend/resume >> 3. echo rescan > /sys/bus/pci/devices/0000:00:1c.0/rescan >> 4. Load nouveau driver >> 5. Start X >> >> I worked through the rescan codepath until I had isolated the specific >> code which magically makes things work (in pci_bridge_check_ranges). >> >> Having found that, step 3 in the above test procedure can be replaced >> with a simple: >> setpci -s 00:1c.0 0x28.l=0 >> >>> When you say "parent PCI" bridge, is that actually the device you see in >>> "lspci -tv"? On a Dell XPS 9560, the GPU is under a different device: >>> >>> -[0000:00]-+-00.0 Intel Corporation Xeon E3-1200 v6/7th Gen Core Processor Host Bridge/DRAM Registers >>> +-01.0-[01]----00.0 NVIDIA Corporation GP107M [GeForce GTX 1050 Mobile] >>> >>> 00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) [8086:1901] (rev 05) >> >> Yes, it's the parent bridge shown by lspci. The address of this varies >> from system to system. >> >>>> 1. Is the Intel PCI bridge misbehaving here? Why does writing the same >>>> value of PCI_PREF_BASE_UPPER32 make any difference at all? >>> >>> At what point in the suspend code path did you insert this write? It is >>> possible that the write somehow acted as a fence/memory barrier? >> >> static void quirk_pref_base_upper32(struct pci_dev *dev) >> { >> u32 pref_base_upper32; >> pci_read_config_dword(dev, PCI_PREF_BASE_UPPER32, &pref_base_upper32); >> pci_write_config_dword(dev, PCI_PREF_BASE_UPPER32, pref_base_upper32); >> } >> DECLARE_PCI_FIXUP_RESUME(PCI_VENDOR_ID_INTEL, 0x9d10, quirk_pref_base_upper32); >> > > this workaround fixes runtime suspend/resume on my laptop as well... > but what baffles me most is, unloading nouveau does as well. I will > see what bits are exactly "fixing" it in the nouveau unloading path > and maybe we can get around this issue inside nouveau. It would be > still nice to get to the root cause of all of this as there are three > known workarounds (at least on my system): > 1. unload nouveau > 2. skip setting the D3 power state via PCI config space (and still do > the ACPI bits) > 3. write value of PCI_PREF_BASE_UPPER32 > >> I don't think it's acting as a barrier. I tried changing this code to >> rewrite other registers such as PCI_PREF_MEMORY_BASE and that makes >> the bug come back. >> >>>> 2. Who is responsible for saving and restoring PCI bridge >>>> configuration during suspend and resume? Linux? ACPI? BIOS? >>> >>> Not sure about PCI bridges, but at least for the PCI Express Capability >>> registers, it is in control of the OS when control is granted via the >>> ACPI _OSC method. >> >> I guess you are referring to pci_save_pcie_state(). I can't see >> anything equivalent for the bridge registers. >> >>> As Windows is probably not affected by this issue, a change must be >>> possible to make Linux more compatible with Windows. Though I am not >>> sure what change is needed. >> >> I agree. There's a definite difference with Windows here and it would >> be great to find a fix along those lines. >> >>> I recently compared PCI configuration space access and ACPI method >>> invocation using QEMU + VFIO with Linux 4.18, Windows 7 and Windows 10 >>> (1803). There were differences like disabling MSI/interrupts before >>> suspend, setting the Enable Clock Power Management bit in PCI Express >>> Link Control and more, but applying these changes were so far not really >>> successful. >> >> Interesting. Do you know any way that I could spy on Windows' accesses >> to the PCI bridge registers? >> Looking at at https://wiki.archlinux.org/index.php/PCI_passthrough_via_OVMF >> I suspect VFIO would not help me here. >> It says: >> Note: If they are grouped with other devices in this manner, pci >> root ports and bridges should neither be bound to vfio at boot, nor be >> added to the VM. >> >> Thanks >> Daniel >> _______________________________________________ >> Nouveau mailing list >> Nouveau@xxxxxxxxxxxxxxxxxxxxx >> https://lists.freedesktop.org/mailman/listinfo/nouveau