śr., 3 sie 2022 o 14:55 Vidya Sagar <vidyas@xxxxxxxxxx> napisał(a): > > Thanks Lukasz for the logs. > I still that the L1SS capability in the root port (00:14.0) disappeared > after resume. > I still don't understand how this patch can make the capability register > itself disappear. Honestly, I still see this as a HW issue. > Bjorn, could you please throw some light on this? > > Thanks, > Vidya Sagar > > On 8/3/2022 5:34 PM, Lukasz Majczak wrote: > > External email: Use caution opening links or attachments > > > > > > pt., 29 lip 2022 o 16:36 Vidya Sagar <vidyas@xxxxxxxxxx> napisał(a): > >> > >> Hi Lukasz, > >> Thanks for sharing your observations. > >> > >> Could you please also share the output of 'sudo lspci -vvvv' before and > >> after suspend-resume cycle with the latest linux-next? > >> Do we still see the L1SS capabilities getting disappeared post resume? > >> > >> Thanks, > >> Vidya Sagar > >> > >> On 7/29/2022 3:09 PM, Lukasz Majczak wrote: > >>> External email: Use caution opening links or attachments > >>> > >>> > >>> wt., 26 lip 2022 o 09:20 Lukasz Majczak <lma@xxxxxxxxxxxx> napisał(a): > >>>> > >>>> wt., 26 lip 2022 o 00:51 Rajat Jain <rajatja@xxxxxxxxxx> napisał(a): > >>>>> > >>>>> Hello, > >>>>> > >>>>> On Sat, Jul 23, 2022 at 10:03 AM Vidya Sagar <vidyas@xxxxxxxxxx> wrote: > >>>>>> > >>>>>> Agree with Bjorn's observations. > >>>>>> The fact that the L1SS capability registers themselves disappeared in > >>>>>> the root port post resume indicates that there seems to be something > >>>>>> wrong with the BIOS itself. > >>>>>> Could you please check from that perspective? > >>>>> > >>>>> ChromeOS Intel platforms use S0ix (suspend-to-idle) for suspend. This > >>>>> is a shallower sleep state that preserves more state than, for e.g. S3 > >>>>> (suspend-to-RAM). When we use S0ix, then BIOS does not come in picture > >>>>> at all. i.e. after the kernel runs its suspend routines, it just puts > >>>>> the CPU into S0ix state. So I do not think there is a BIOS angle to > >>>>> this. > >>>>> > >>>>> > >>>>>> > >>>>>> Thanks, > >>>>>> Vidya Sagar > >>>>>> > >>>>>> > >>>>>> On 7/22/2022 11:12 PM, Bjorn Helgaas wrote: > >>>>>>> External email: Use caution opening links or attachments > >>>>>>> > >>>>>>> > >>>>>>> On Fri, Jul 22, 2022 at 11:41:14AM +0200, Lukasz Majczak wrote: > >>>>>>>> pt., 22 lip 2022 o 09:31 Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx> napisał(a): > >>>>>>>>> On Fri, Jul 15, 2022 at 6:38 PM Ben Chuang <benchuanggli@xxxxxxxxx> wrote: > >>>>>>>>>> On Tue, Jul 5, 2022 at 2:00 PM Vidya Sagar <vidyas@xxxxxxxxxx> wrote: > >>>>>>>>>>> > >>>>>>>>>>> Previously ASPM L1 Substates control registers (CTL1 and CTL2) weren't > >>>>>>>>>>> saved and restored during suspend/resume leading to L1 Substates > >>>>>>>>>>> configuration being lost post-resume. > >>>>>>>>>>> > >>>>>>>>>>> Save the L1 Substates control registers so that the configuration is > >>>>>>>>>>> retained post-resume. > >>>>>>>>>>> > >>>>>>>>>>> Signed-off-by: Vidya Sagar <vidyas@xxxxxxxxxx> > >>>>>>>>>>> Tested-by: Abhishek Sahu <abhsahu@xxxxxxxxxx> > >>>>>>>>>> > >>>>>>>>>> Hi Vidya, > >>>>>>>>>> > >>>>>>>>>> I tested this patch on kernel v5.19-rc6. > >>>>>>>>>> The test device is GL9755 card reader controller on Intel i5-10210U RVP. > >>>>>>>>>> This patch can restore L1SS after suspend/resume. > >>>>>>>>>> > >>>>>>>>>> The test results are as follows: > >>>>>>>>>> > >>>>>>>>>> After Boot: > >>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates" > >>>>>>>>>> Capabilities: [110 v1] L1 PM Substates > >>>>>>>>>> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ > >>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+ > >>>>>>>>>> PortCommonModeRestoreTime=255us > >>>>>>>>>> PortTPowerOnTime=3100us > >>>>>>>>>> L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > >>>>>>>>>> T_CommonMode=0us LTR1.2_Threshold=3145728ns > >>>>>>>>>> L1SubCtl2: T_PwrOn=3100us > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> After suspend/resume without this patch. > >>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates" > >>>>>>>>>> Capabilities: [110 v1] L1 PM Substates > >>>>>>>>>> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ > >>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+ > >>>>>>>>>> PortCommonModeRestoreTime=255us > >>>>>>>>>> PortTPowerOnTime=3100us > >>>>>>>>>> L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- > >>>>>>>>>> T_CommonMode=0us LTR1.2_Threshold=0ns > >>>>>>>>>> L1SubCtl2: T_PwrOn=10us > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> After suspend/resume with this patch. > >>>>>>>>>> #lspci -d 17a0:9755 -vvv | grep -A5 "L1 PM Substates" > >>>>>>>>>> Capabilities: [110 v1] L1 PM Substates > >>>>>>>>>> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ > >>>>>>>>>> ASPM_L1.1+ L1_PM_Substates+ > >>>>>>>>>> PortCommonModeRestoreTime=255us > >>>>>>>>>> PortTPowerOnTime=3100us > >>>>>>>>>> L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > >>>>>>>>>> T_CommonMode=0us LTR1.2_Threshold=3145728ns > >>>>>>>>>> L1SubCtl2: T_PwrOn=3100us > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Tested-by: Ben Chuang <benchuanggli@xxxxxxxxx> > >>>>>>>>> > >>>>>>>>> Forgot to add mine: > >>>>>>>>> Tested-by: Kai-Heng Feng <kai.heng.feng@xxxxxxxxxxxxx> > >>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Best regards, > >>>>>>>>>> Ben Chuang > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> --- > >>>>>>>>>>> Hi, > >>>>>>>>>>> Kenneth R. Crudup <kenny@xxxxxxxxx>, Could you please verify this patch > >>>>>>>>>>> on your laptop (Dell XPS 13) one last time? > >>>>>>>>>>> IMHO, the regression observed on your laptop with an old version of the patch > >>>>>>>>>>> could be due to a buggy old version BIOS in the laptop. > >>>>>>>>>>> > >>>>>>>>>>> Thanks, > >>>>>>>>>>> Vidya Sagar > >>>>>>>>>>> > >>>>>>>>>>> drivers/pci/pci.c | 7 +++++++ > >>>>>>>>>>> drivers/pci/pci.h | 4 ++++ > >>>>>>>>>>> drivers/pci/pcie/aspm.c | 44 +++++++++++++++++++++++++++++++++++++++++ > >>>>>>>>>>> 3 files changed, 55 insertions(+) > >>>>>>>>>>> > >>>>>>>>>>> diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c > >>>>>>>>>>> index cfaf40a540a8..aca05880aaa3 100644 > >>>>>>>>>>> --- a/drivers/pci/pci.c > >>>>>>>>>>> +++ b/drivers/pci/pci.c > >>>>>>>>>>> @@ -1667,6 +1667,7 @@ int pci_save_state(struct pci_dev *dev) > >>>>>>>>>>> return i; > >>>>>>>>>>> > >>>>>>>>>>> pci_save_ltr_state(dev); > >>>>>>>>>>> + pci_save_aspm_l1ss_state(dev); > >>>>>>>>>>> pci_save_dpc_state(dev); > >>>>>>>>>>> pci_save_aer_state(dev); > >>>>>>>>>>> pci_save_ptm_state(dev); > >>>>>>>>>>> @@ -1773,6 +1774,7 @@ void pci_restore_state(struct pci_dev *dev) > >>>>>>>>>>> * LTR itself (in the PCIe capability). > >>>>>>>>>>> */ > >>>>>>>>>>> pci_restore_ltr_state(dev); > >>>>>>>>>>> + pci_restore_aspm_l1ss_state(dev); > >>>>>>>>>>> > >>>>>>>>>>> pci_restore_pcie_state(dev); > >>>>>>>>>>> pci_restore_pasid_state(dev); > >>>>>>>>>>> @@ -3489,6 +3491,11 @@ void pci_allocate_cap_save_buffers(struct pci_dev *dev) > >>>>>>>>>>> if (error) > >>>>>>>>>>> pci_err(dev, "unable to allocate suspend buffer for LTR\n"); > >>>>>>>>>>> > >>>>>>>>>>> + error = pci_add_ext_cap_save_buffer(dev, PCI_EXT_CAP_ID_L1SS, > >>>>>>>>>>> + 2 * sizeof(u32)); > >>>>>>>>>>> + if (error) > >>>>>>>>>>> + pci_err(dev, "unable to allocate suspend buffer for ASPM-L1SS\n"); > >>>>>>>>>>> + > >>>>>>>>>>> pci_allocate_vc_save_buffers(dev); > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h > >>>>>>>>>>> index e10cdec6c56e..92d8c92662a4 100644 > >>>>>>>>>>> --- a/drivers/pci/pci.h > >>>>>>>>>>> +++ b/drivers/pci/pci.h > >>>>>>>>>>> @@ -562,11 +562,15 @@ void pcie_aspm_init_link_state(struct pci_dev *pdev); > >>>>>>>>>>> void pcie_aspm_exit_link_state(struct pci_dev *pdev); > >>>>>>>>>>> void pcie_aspm_pm_state_change(struct pci_dev *pdev); > >>>>>>>>>>> void pcie_aspm_powersave_config_link(struct pci_dev *pdev); > >>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev); > >>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev); > >>>>>>>>>>> #else > >>>>>>>>>>> static inline void pcie_aspm_init_link_state(struct pci_dev *pdev) { } > >>>>>>>>>>> static inline void pcie_aspm_exit_link_state(struct pci_dev *pdev) { } > >>>>>>>>>>> static inline void pcie_aspm_pm_state_change(struct pci_dev *pdev) { } > >>>>>>>>>>> static inline void pcie_aspm_powersave_config_link(struct pci_dev *pdev) { } > >>>>>>>>>>> +static inline void pci_save_aspm_l1ss_state(struct pci_dev *dev) { } > >>>>>>>>>>> +static inline void pci_restore_aspm_l1ss_state(struct pci_dev *dev) { } > >>>>>>>>>>> #endif > >>>>>>>>>>> > >>>>>>>>>>> #ifdef CONFIG_PCIE_ECRC > >>>>>>>>>>> diff --git a/drivers/pci/pcie/aspm.c b/drivers/pci/pcie/aspm.c > >>>>>>>>>>> index a96b7424c9bc..2c29fdd20059 100644 > >>>>>>>>>>> --- a/drivers/pci/pcie/aspm.c > >>>>>>>>>>> +++ b/drivers/pci/pcie/aspm.c > >>>>>>>>>>> @@ -726,6 +726,50 @@ static void pcie_config_aspm_l1ss(struct pcie_link_state *link, u32 state) > >>>>>>>>>>> PCI_L1SS_CTL1_L1SS_MASK, val); > >>>>>>>>>>> } > >>>>>>>>>>> > >>>>>>>>>>> +void pci_save_aspm_l1ss_state(struct pci_dev *dev) > >>>>>>>>>>> +{ > >>>>>>>>>>> + int aspm_l1ss; > >>>>>>>>>>> + struct pci_cap_saved_state *save_state; > >>>>>>>>>>> + u32 *cap; > >>>>>>>>>>> + > >>>>>>>>>>> + if (!pci_is_pcie(dev)) > >>>>>>>>>>> + return; > >>>>>>>>>>> + > >>>>>>>>>>> + aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS); > >>>>>>>>>>> + if (!aspm_l1ss) > >>>>>>>>>>> + return; > >>>>>>>>>>> + > >>>>>>>>>>> + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS); > >>>>>>>>>>> + if (!save_state) > >>>>>>>>>>> + return; > >>>>>>>>>>> + > >>>>>>>>>>> + cap = (u32 *)&save_state->cap.data[0]; > >>>>>>>>>>> + pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, cap++); > >>>>>>>>>>> + pci_read_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, cap++); > >>>>>>>>>>> +} > >>>>>>>>>>> + > >>>>>>>>>>> +void pci_restore_aspm_l1ss_state(struct pci_dev *dev) > >>>>>>>>>>> +{ > >>>>>>>>>>> + int aspm_l1ss; > >>>>>>>>>>> + struct pci_cap_saved_state *save_state; > >>>>>>>>>>> + u32 *cap; > >>>>>>>>>>> + > >>>>>>>>>>> + if (!pci_is_pcie(dev)) > >>>>>>>>>>> + return; > >>>>>>>>>>> + > >>>>>>>>>>> + aspm_l1ss = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_L1SS); > >>>>>>>>>>> + if (!aspm_l1ss) > >>>>>>>>>>> + return; > >>>>>>>>>>> + > >>>>>>>>>>> + save_state = pci_find_saved_ext_cap(dev, PCI_EXT_CAP_ID_L1SS); > >>>>>>>>>>> + if (!save_state) > >>>>>>>>>>> + return; > >>>>>>>>>>> + > >>>>>>>>>>> + cap = (u32 *)&save_state->cap.data[0]; > >>>>>>>>>>> + pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL2, *cap++); > >>>>>>>>>>> + pci_write_config_dword(dev, aspm_l1ss + PCI_L1SS_CTL1, *cap++); > >>>>>>>>>>> +} > >>>>>>>>>>> + > >>>>>>>>>>> static void pcie_config_aspm_dev(struct pci_dev *pdev, u32 val) > >>>>>>>>>>> { > >>>>>>>>>>> pcie_capability_clear_and_set_word(pdev, PCI_EXP_LNKCTL, > >>>>>>>>>>> -- > >>>>>>>>>>> 2.17.1 > >>>>>>>>>>> > >>>>>>>> > >>>>>>>> Hi, > >>>>>>>> > >>>>>>>> With this patch (and also mentioned > >>>>>>>> https://lore.kernel.org/all/20220509073639.2048236-1-kai.heng.feng@xxxxxxxxxxxxx/) > >>>>>>>> applied on 5.10 (chromeos-5.10) I am observing problems after > >>>>>>>> suspend/resume with my WiFi card - it looks like whole communication > >>>>>>>> via PCI fails. Attaching logs (dmesg, lspci -vvv before suspend/resume > >>>>>>>> and after) https://gist.github.com/semihalf-majczak-lukasz/fb36dfa2eff22911109dfb91ab0fc0e3 > >>>>>>>> > >>>>>>>> I played a little bit with this code and it looks like the > >>>>>>>> pci_write_config_dword() to the PCI_L1SS_CTL1 breaks it (don't know > >>>>>>>> why, not a PCI expert). > >>>>>>> > >>>>>>> Thanks a lot for testing this! I'm not quite sure what to make of the > >>>>>>> results since v5.10 is fairly old (Dec 2020) and I don't know what > >>>>>>> other changes are in chromeos-5.10. > >>>>> > >>>>> Lukasz: I assume you are running this on Atlas and are seeing this bug > >>>>> when uprev'ving it to 5.10 kernel. Can you please try it on a newer > >>>>> Intel platform that have the latest upstream kernel running already > >>>>> and see if this can be reproduced there too? > >>>>> Note that the wifi PCI device is different on newer Intel platforms, > >>>>> but platform design is similar enough that I suspect we should see > >>>>> similar bug on those too. The other option is to try the latest > >>>>> ustream kernel on Atlas. Perhaps if we just care about wifi (and > >>>>> ignore bringing up the graphics stack and GUI), it may come up > >>>>> sufficiently enough to try this patch? > >>>>> > >>>>> Thanks, > >>>>> > >>>>> Rajat > >>>>> > >>>>> > >>>>>>> > >>>>>>> Random observations, no analysis below. This from your dmesg > >>>>>>> certainly looks like PCI reads failing and returning ~0: > >>>>>>> > >>>>>>> Timeout waiting for hardware access (CSR_GP_CNTRL 0xffffffff) > >>>>>>> iwlwifi 0000:01:00.0: 00000000: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff > >>>>>>> iwlwifi 0000:01:00.0: Device gone - attempting removal > >>>>>>> Hardware became unavailable upon resume. This could be a software issue prior to suspend or a hardware issue. > >>>>>>> > >>>>>>> And then we re-enumerate 01:00.0 and it looks like it may have been > >>>>>>> reset (BAR is 0): > >>>>>>> > >>>>>>> pci 0000:01:00.0: [8086:095a] type 00 class 0x028000 > >>>>>>> pci 0000:01:00.0: reg 0x10: [mem 0x00000000-0x00001fff 64bit] > >>>>>>> > >>>>>>> lspci diffs from before/after suspend: > >>>>>>> > >>>>>>> 00:14.0 PCI bridge: Intel Corporation Celeron N3350/Pentium N4200/Atom E3900 Series PCI Express Port B #1 (rev fb) (prog-if 00 [Normal decode]) > >>>>>>> Bus: primary=00, secondary=01, subordinate=01, sec-latency=64 > >>>>>>> - DevSta: CorrErr- NonFatalErr+ FatalErr- UnsupReq+ AuxPwr+ TransPend- > >>>>>>> + DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq- AuxPwr+ TransPend- > >>>>>>> - LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ > >>>>>>> + LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+ > >>>>>>> - LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete- EqualizationPhase1- > >>>>>>> + LnkSta2: Current De-emphasis Level: -3.5dB, EqualizationComplete- EqualizationPhase1- > >>>>>>> - Capabilities: [150 v0] Null > >>>>>>> - Capabilities: [200 v1] L1 PM Substates > >>>>>>> - L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > >>>>>>> - PortCommonModeRestoreTime=40us PortTPowerOnTime=10us > >>>>>>> - L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > >>>>>>> - T_CommonMode=40us LTR1.2_Threshold=98304ns > >>>>>>> - L1SubCtl2: T_PwrOn=60us > >>>>>>> > >>>>>>> The DevSta differences might be BIOS bugs, probably not relevant. > >>>>>>> Interesting that ASPM is disabled, maybe didn't get enabled after > >>>>>>> re-enumerating 01:00.0? Strange that the L1 PM Substates capability > >>>>>>> disappeared. > >>>>>>> > >>>>>>> 01:00.0 Network controller: Intel Corporation Wireless 7265 (rev 59) > >>>>>>> LnkCtl: ASPM L1 Enabled; RCB 64 bytes, Disabled- CommClk+ > >>>>>>> - ExtSynch- ClockPM+ AutWidDis- BWInt- AutBWInt- > >>>>>>> + ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt- > >>>>>>> Capabilities: [154 v1] L1 PM Substates > >>>>>>> L1SubCap: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ L1_PM_Substates+ > >>>>>>> PortCommonModeRestoreTime=30us PortTPowerOnTime=60us > >>>>>>> - L1SubCtl1: PCI-PM_L1.2+ PCI-PM_L1.1+ ASPM_L1.2+ ASPM_L1.1+ > >>>>>>> - T_CommonMode=0us LTR1.2_Threshold=98304ns > >>>>>>> + L1SubCtl1: PCI-PM_L1.2- PCI-PM_L1.1- ASPM_L1.2- ASPM_L1.1- > >>>>>>> + T_CommonMode=0us LTR1.2_Threshold=0ns > >>>>>>> > >>>>>>> Dmesg claimed we reconfigured common clock config. Maybe ASPM didn't > >>>>>>> get reinitialized after re-enumeration? Looks like we didn't restore > >>>>>>> L1SubCtl1. > >>>>>>> > >>>>>>> Bjorn > >>>>>>> > >>>> > >>>> Hi, > >>>> > >>>> Thank you all for the response and input! As Rajat mentioned I'm using > >>>> chromebook - but not Atlas (Amberlake) - in this case it is Babymega > >>>> (Apollolake) - I will try to load most recent kernel and give it a > >>>> try once again. > >>>> > >>>> Best regards, > >>>> Lukasz > >>> > >>> Hi, > >>> > >>> I have applied this patch on top of v5.19-rc7 (chromeos) and I'm > >>> still getting same results: > >>> https://gist.github.com/semihalf-majczak-lukasz/4b716704c21a3758d6711b2030ea34b9 > >>> > >>> Best regards, > >>> Lukasz > >>> > > Hi Vidya, > > > > Sorry for the long delay, I have retested your patch on top of > > linux-next/master (next-20220802) - the results for my device remain > > the same. > > Here are the logs (lspci -vvv before suspend, lspci -vvv after resume and dmesg) > > https://gist.github.com/semihalf-majczak-lukasz/c7bfd811359f23278034056a8002b3ef > > Let me know if you need any more logs and/or tests. > > > > Best regards, > > Lukasz > > Hi Vidya, After your last email, I've re-tested my setup and (without your patch) the capability register also disappears - so it looks there is - in fact - some problem in my setup and your patch just brings it to the top as after resume tries to write to a register that is no longer present. I'm very sorry for the confusion here and I've not notice that at the very beginning. Best regards, Lukasz