On Wed, Feb 7, 2024 at 9:05 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > Can you run "sudo lspci -vvxxxx -s00:06.0" before putting the Root > Port in D3hot, and then again after putting it back in D0 (when NVMe > is inaccessible), and attach both outputs to the bugzilla? Done: https://bugzilla.kernel.org/show_bug.cgi?id=215742#c21 > Given that D3cold is just "main power off," and obviously the Root > Port *can* transition from D3cold to D0 (at initial platform power-up > if nothing else), this seems kind of strange and makes me think we may > not completely understand the root cause, e.g., maybe some config > didn't get restored. > > But the fact that Windows doesn't use D3cold in this case suggests > that either (1) Windows has a similar quirk to work around this, or > (2) Windows decides whether to use D3cold differently than Linux does. > > I have no data, but (1) seems sort of unlikely. In case it turns out > to be (2) and we figure out how to fix it that way someday, can you > add the output of "sudo lspci -vvxxxx" of the system to the bugzilla? https://bugzilla.kernel.org/show_bug.cgi?id=215742#c27 Some other interesting observations from Windows, observed via socwatch & VTune: On affected BIOS versions: CPU does not go into the lowest power state PC10 during suspend - it only reaches PC8. SLP_S0# signal is not asserted (which follows from it not reaching PC10). NVMe device in D0 and the HDD LED briefly blinks every 1-2 seconds (can't recall if it a regular or irregular blink) On latest BIOS version: PC10 reached and SLP_S0# asserted during suspend, but only for about 25% of the suspend time NVMe device in D0 and the HDD LED briefly blinks every 1-2 seconds (can't recall if it a regular or irregular blink) The LED blinking leaves me wondering if there is something "using" the disk during suspend in Windows, so that's why it doesn't try to power it down even on the original version with StorageD3Enable=1. This HDD LED blinking during suspend does not happen on Linux, not even when NVMe device is left in D0 over suspend with the regular nvme_suspend() path putting the NVMe device into lower power mode at the NVMe protocol level. > What would be the downside of skipping the DMI table and calling > pci_d3cold_disable() always? If this truly is a Root Port defect, it > should affect all platforms with this device, and what's the benefit > of relying on BIOS to use StorageD3Enable to avoid the defect? I had more assumed that it was a platform-specific DSDT bug, in that PEG0.PXP._OFF is doing something that PEG0.PXP._ON is unable to recover from, and that other platforms might handle the suspend/resume of this root port more correctly. Not sure if it is reasonable to assume that all other platforms on the same chipset have the same bug (if that's what this is). Daniel