On Tue, 4 Feb 2025, Bjorn Helgaas wrote: > [+cc Rafael] > > On Fri, Jan 31, 2025 at 05:29:13PM +0200, Ilpo Järvinen wrote: > > The commit 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in > > pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both > > the Upstream Port and its upstream bridge when handling an Upstream > > Port, which matches what the L1SS restore side does. However, > > parent->state_saved can be set true at an earlier time when the > > upstream bridge saved other parts of its state. > > So I guess the scenario is that we got here because some driver called > pci_save_state(pdev): > > pci_save_state > dev->state_saved = true <-- > pci_save_pcie_state > pci_save_aspm_l1ss_state > if (pcie_downstream_port(pdev)) > return > # save pdev L1SS state here > if (parent->state_saved) <-- > return > # save parent L1SS state here > > and the problem is that we previously called pci_save_state(parent), > which set "parent->state_saved = true" but did not save its L1SS state > because pci_save_aspm_l1ss_state() is a no-op for Downstream Ports, > right? Yes! An unfortunate interaction between those two checks. > But I would think this would be a very common situation because > pcie_portdrv_probe() calls pci_save_state() for Downstream Ports when > pciehp, AER, PME, etc, are enabled, and this would happen before the > pci_save_state() calls from Endpoint drivers. > > So I'm a little surprised that this didn't blow up for everybody > immediately. Is there something that makes this unusual? I agree it should be very common and was quite surprised that -next did not catch it. What I recall though is you modified the patch while applying it by adding those Downstream Port checks so the fix patch's Tested-by was for different code from what got applied (and it would have been caught would the original author have tested also the modified commit). Unfortunately, I cannot think of anything that would be so unusual to warrant not detecting it earlier. Maybe it was just the holiday period causing less testing and lower level of awareness in general? The machine doesn't always hang because of the problem as was the case with Niklāvs, so it might have occurred but went unnoticed if it occurred for a device that is not critical. > > Then later when > > attempting to save the L1SS config while handling the Upstream Port, > > parent->state_saved is true in pci_save_aspm_l1ss_state() resulting in > > early return and skipping saving bridge's L1SS config because it is > > assumed to be already saved. Later on restore, junk is written into > > L1SS config which causes issues with some devices. > > > > Remove parent->state_saved check and unconditionally save L1SS config > > also for the upstream bridge from an Upstream Port which ought to be > > harmless from correctness point of view. With the Upstream Port check > > now present, saving the L1SS config more than once for the bridge is no > > longer a problem (unlike when the parent->state_saved check got > > introduced into the fix during its development). > > > > Fixes: 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()") > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731 > > Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@xxxxxxxxx> > > Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@xxxxxxxxx> > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> > > --- > > drivers/pci/pcie/aspm.c | 3 --- > > 1 file changed, 3 deletions(-) > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > index e0bc90597dca..da3e7edcf49d 100644 > > --- a/drivers/pci/pcie/aspm.c > > +++ b/drivers/pci/pcie/aspm.c > > @@ -108,9 +108,6 @@ void pci_save_aspm_l1ss_state(struct pci_dev *pdev) > > pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL2, cap++); > > pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL1, cap++); > > > > - if (parent->state_saved) > > - return; > > - > > /* > > * Save parent's L1 substate configuration so we have it for > > * pci_restore_aspm_l1ss_state(pdev) to restore. > > > > base-commit: 72deda0abee6e705ae71a93f69f55e33be5bca5c > > -- > > 2.39.5 > > > -- i.