On Tue, Jan 16, 2024, Guilherme G. Piccoli wrote: > On 12/01/2024 22:33, Thinh Nguyen wrote: > > [...] > >> Yes, DRD indeed - but it's the **PCI PME interrupt** that uses MSI, not > >> the USB PCI device. Here is an output of lspci: > >> > >> $ lspci -vknns 04:00.3 > >> 04:00.3 USB controller [0c03]: Advanced Micro Devices, Inc. [AMD] > >> VanGogh USB0 [1022:163a] (prog-if fe [USB Device]) > >> Subsystem: Valve Software VanGogh USB0 [1e44:1776] > >> Flags: bus master, fast devsel, latency 0, IRQ 26 > >> Memory at 80000000 (64-bit, non-prefetchable) [size=1M] > >> Capabilities: [48] Vendor Specific Information: Len=08 <?> > >> Capabilities: [50] Power Management version 3 > >> Capabilities: [64] Express Endpoint, MSI 00 > >> Capabilities: [a0] MSI: Enable- Count=1/8 Maskable- 64bit+ > >> Capabilities: [c0] MSI-X: Enable- Count=8 Masked- > > > > Are you showing MSI enabled? That looks like MSI is disabled. > > > >> Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 > >> Len=010 <?> > >> Kernel driver in use: dwc3-pci > >> Kernel modules: dwc3_pci > >> > >> Now, I **guess** this is expected, but there is a difference in > >> /proc/interrupt between device and host mode: > >> > >> $ grep 26: /proc/interrupts | tr -s \ # device mode > >> [empty] > >> > >> $ grep 26: /proc/interrupts | tr -s \ # host mode > >> 26: 0 0 0 0 0 0 0 0 IO-APIC 25-fasteoi xhci-hcd:usb3 > > > > Looks like it's level interrupt here. I don't see "edge" or MSI > > interrupt. Is the pcie_pme share the same interrupt line as the usb > > controller? > > > > Hi Thinh, thanks again for your observations. > > So, the USB device interrupt is *level* triggered, no MSI involved > indeed, you pointed in the above lspci output. > > However, the *PME PCIe interrupt is MSI* by default. And we can shift > that to level through a kernel parameter ("pcie_pme=nomsi"), the net > effect is the same, i.e., issue is still present. I'll paste here the > lspci of the PCI bridge device that triggers the wakeup: > > $ cat /sys/power/pm_wakeup_irq > 28 Ok. You're referring to the system PME here and not from the controller's PMU. Then the question related to hibernation is irrelevant. I understand the problem better now. > > > $ cat /proc/interrupts |grep -w 28: | tr -s \ > 28: 0 0 1 0 0 0 0 0 PCI-MSI-0000:00:08.1 0-edge PCIe PME [0] > > > $ lspci -knnvs 0000:00:08.1 > 00:08.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] VanGogh > Internal PCIe GPP Bridge to Bus [1022:1648] (prog-if 00 [Normal decode]) > Subsystem: Advanced Micro Devices, Inc. [AMD] VanGogh Internal > PCIe GPP Bridge to Bus [1022:1648] > Flags: bus master, fast devsel, latency 0, IRQ 28 > Bus: primary=00, secondary=04, subordinate=04, sec-latency=0 > I/O behind bridge: 1000-1fff [size=4K] [16-bit] > Memory behind bridge: 80000000-803fffff [size=4M] [32-bit] > Prefetchable memory behind bridge: f8e0000000-f8f01fffff > [size=258M] [32-bit] > Capabilities: [50] Power Management version 3 > Capabilities: [58] Express Root Port (Slot-), MSI 00 > Capabilities: [a0] MSI: Enable+ Count=1/1 Maskable- 64bit+ > Capabilities: [c0] Subsystem: Advanced Micro Devices, Inc. [AMD] > VanGogh Internal PCIe GPP Bridge to Bus [1022:1648] > Capabilities: [100] Vendor Specific Information: ID=0001 Rev=1 > Len=010 <?> > Capabilities: [270] Secondary PCI Express > Capabilities: [400] Data Link Feature <?> > Capabilities: [410] Physical Layer 16.0 GT/s <?> > Capabilities: [440] Lane Margining at the Receiver <?> > Kernel driver in use: pcieport > > > [0] The /proc/interrupts increases by 1 at every wakeup attempt when in > MSI mode. No increase happens if in level mode. > > > Another output that could be helpful to understand the PCI topology is > the lspci -t, I have a pastebin for that (also attached here): > https://urldefense.com/v3/__https://termbin.com/dccj__;!!A4F2R9G_pg!Z5hf2_fTZMHk42qd4Fb22mjHgFJuFWQz7iz-LgurN38X5JG0zk9gdyoZb0QnqE2eKschkeKt2eRTCIjSv9KutQ$ > > > > I'm not sure how the STEAM DECK is designed. Does the OOT driver manages > > the power state request outside of the PCI PM for device mode and not > > just reading the port state? If that's the case, the issue could be in > > the OOT driver? > > If you want to take a look in how the OOT driver switches the dwc3 mode > between device/host, here is the code: > https://urldefense.com/v3/__https://lore.kernel.org/lkml/20220206022023.376142-1-andrew.smirnov@xxxxxxxxx/__;!!A4F2R9G_pg!Z5hf2_fTZMHk42qd4Fb22mjHgFJuFWQz7iz-LgurN38X5JG0zk9gdyoZb0QnqE2eKschkeKt2eRTCIg_fm8DMQ$ > > In fact, this was split in a single small extcon driver in the Valve > tree, but the code is the same as above, take a look specially in the > functions steamdeck_notify() and steamdeck_usb_role_work(). > > Notice *all* my tests are with mainline (6.7-rc6+) and *without* this > OOT driver, so I'm not sure how could it be related...the OOT just > relies on ACPI messages to switch automatically between host/device > mode, and in my tests, I'm doing this manually. > > > > [...] > >> There was no hibernation (S4 state) involved, just to clarify - it's a > >> mem_sleep /suspend to RAM operation, usually called deep sleep / S3. And > >> indeed, the PME seems to be generated and prevents the mem_sleep (or it > >> does sleep but instantly wakes-up, which is the case with level interrupts). > > > > I was referring to the controller hibernation and not system > > hibernation. S3 will cause the xhci driver to put the host controller to > > go into hibernation and send a power state change request through PCI > > PM. Usually the power state change would put the core in D3 and passes > > the control to the PMU, and PME generation can happen then. Similar > > behavior applies to device mode, but the power state change may be tied > > to a different interface than PCI PM? > > > > Perhaps you're missing the logic/flow to update the power state change > > when in device mode. And perhaps putting the controller in host mode > > passes the control to xhci allowing the driver to properly manage the > > power state preventing PME from generated? It's a little difficult to > > debug without more info. > >> Did you seek help from the vendor? > > Thanks for the clarification about the hibernation concept! > > "perhaps putting the controller in host mode passes the control to xhci" > <- this is true if we *manually* set the device to host mode in debugfs > - xhci_hcd takes control, and all works fine. > > But it is *not the case* with the quirk - we just write to that register > in the last step of dwc3 suspend. For example, I just tried here with no > xhci module available, with dwc3 in device mode - and suspend works fine > _with the quirk_. > > Would be very interesting to have the datasheet of this device in hands > to determine what this write does in the controller exactly, but I have > no access to it. That would likely explain a lot about why the quirk works. > > Regarding HW vendor support, as Vivek said, they're looped it seems, but > my goal with the quirk (that I've now restricted to this specific device > and will resubmit) is to unblock the usage of DRD in the Steam Deck for > the users in the short-term. > It would be better to see if we can root cause the problem first. We should avoid temporary solution if we can. Once a quirk is introduced, it's challenging to remove the quirk. Please confirm that the STEAM DECK is soft-disconnected when you put it in suspend. That's the current implementation of the dwc3. If not, then it's possible the activity over the wire can wake up the STEAM DECK since the controller is still active. If it is soft-disconnected, but the PME is still generated after system suspend, can you check if that's also the case when physically disconnected? Thanks, Thinh