On Wed, Feb 05, 2025 at 10:38:24AM +0200, Ilpo Järvinen wrote: > On Tue, 4 Feb 2025, Bjorn Helgaas wrote: > > > [+cc Rafael] > > > > On Fri, Jan 31, 2025 at 05:29:13PM +0200, Ilpo Järvinen wrote: > > > The commit 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in > > > pci_save_aspm_l1ss_state()") aimed to perform L1SS config save for both > > > the Upstream Port and its upstream bridge when handling an Upstream > > > Port, which matches what the L1SS restore side does. However, > > > parent->state_saved can be set true at an earlier time when the > > > upstream bridge saved other parts of its state. > > > > So I guess the scenario is that we got here because some driver called > > pci_save_state(pdev): > > > > pci_save_state > > dev->state_saved = true <-- > > pci_save_pcie_state > > pci_save_aspm_l1ss_state > > if (pcie_downstream_port(pdev)) > > return > > # save pdev L1SS state here > > if (parent->state_saved) <-- > > return > > # save parent L1SS state here > > > > and the problem is that we previously called pci_save_state(parent), > > which set "parent->state_saved = true" but did not save its L1SS state > > because pci_save_aspm_l1ss_state() is a no-op for Downstream Ports, > > right? > > Yes! An unfortunate interaction between those two checks. > > > But I would think this would be a very common situation because > > pcie_portdrv_probe() calls pci_save_state() for Downstream Ports when > > pciehp, AER, PME, etc, are enabled, and this would happen before the > > pci_save_state() calls from Endpoint drivers. > > > > So I'm a little surprised that this didn't blow up for everybody > > immediately. Is there something that makes this unusual? > > I agree it should be very common and was quite surprised that -next > did not catch it. What I recall though is you modified the patch while > applying it by adding those Downstream Port checks so the fix > patch's Tested-by was for different code from what got applied (and it > would have been caught would the original author have tested also the > modified commit). Sigh, that makes sense, it's probably my fault, sorry. I apologize for messing this up and causing all this extra work. I applied this to pci/for-linus yesterday, so it should make it for v6.14-rc2. > Unfortunately, I cannot think of anything that would be so unusual to > warrant not detecting it earlier. Maybe it was just the holiday period > causing less testing and lower level of awareness in general? The machine > doesn't always hang because of the problem as was the case with Niklāvs, > so it might have occurred but went unnoticed if it occurred for a device > that is not critical. > > > > Then later when > > > attempting to save the L1SS config while handling the Upstream Port, > > > parent->state_saved is true in pci_save_aspm_l1ss_state() resulting in > > > early return and skipping saving bridge's L1SS config because it is > > > assumed to be already saved. Later on restore, junk is written into > > > L1SS config which causes issues with some devices. > > > > > > Remove parent->state_saved check and unconditionally save L1SS config > > > also for the upstream bridge from an Upstream Port which ought to be > > > harmless from correctness point of view. With the Upstream Port check > > > now present, saving the L1SS config more than once for the bridge is no > > > longer a problem (unlike when the parent->state_saved check got > > > introduced into the fix during its development). > > > > > > Fixes: 1db806ec06b7 ("PCI/ASPM: Save parent L1SS config in pci_save_aspm_l1ss_state()") > > > Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219731 > > > Reported-by: Niklāvs Koļesņikovs <pinkflames.linux@xxxxxxxxx> > > > Tested-by: Niklāvs Koļesņikovs <pinkflames.linux@xxxxxxxxx> > > > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@xxxxxxxxxxxxxxx> > > > --- > > > drivers/pci/pcie/aspm.c | 3 --- > > > 1 file changed, 3 deletions(-) > > > > > > diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > > > index e0bc90597dca..da3e7edcf49d 100644 > > > --- a/drivers/pci/pcie/aspm.c > > > +++ b/drivers/pci/pcie/aspm.c > > > @@ -108,9 +108,6 @@ void pci_save_aspm_l1ss_state(struct pci_dev *pdev) > > > pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL2, cap++); > > > pci_read_config_dword(pdev, pdev->l1ss + PCI_L1SS_CTL1, cap++); > > > > > > - if (parent->state_saved) > > > - return; > > > - > > > /* > > > * Save parent's L1 substate configuration so we have it for > > > * pci_restore_aspm_l1ss_state(pdev) to restore. > > > > > > base-commit: 72deda0abee6e705ae71a93f69f55e33be5bca5c > > > -- > > > 2.39.5 > > > > > > > -- > i.