Re: [PATCH v4] PCI/ASPM: Add back L1 PM Substate save and restore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Thomas,

I had updated the bugzilla with a request to run some commands to collect more
information from your system. Are you still able to work on this? Thanks.

https://bugzilla.kernel.org/show_bug.cgi?id=216877#c27

David


On Thu, 2023-10-05 at 17:57 +0200, Thomas Witt wrote:
> On 05/10/2023 17:30, Bjorn Helgaas wrote:
> > On Thu, Oct 05, 2023 at 12:02:58PM +0300, Mika Westerberg wrote:
> > > On Wed, Oct 04, 2023 at 05:23:24PM -0500, Bjorn Helgaas wrote:
> > > ...
> > 
> > > The thing with TUXEDO is that on Thomas's system with mem_sleep=deep
> > > this patch (without denylist) breaks the resume as he describes here:
> > > 
> > > https://bugzilla.kernel.org/show_bug.cgi?id=216877
> > > 
> > > However, on exact same TUXEDO system with the same firmwares Werner is
> > > not able to reproduce the issue with or without the patch. So I'm not
> > > sure what to do and that's why I added denylist that should take effect
> > > on Thomas' system when mem_sleep=deep is set but also work on the same
> > > system without it.
> > > 
> > > And since we have the denylist, I decided to add the ASUS there to avoid
> > > accidentally breaking those too.
> > > ...
> > 
> > > > I think there's still something we're missing.
> > > > 
> > > > We restore the LTR config before restoring DEVCTL2 (including the LTR
> > > > enable bit) and L1SS state.  I don't think we know the state of ASPM
> > > > and L1SS at that point, do we?  Do you think there could be an issue
> > > > there, too?
> > > 
> > > AFAICT LTR does not affect until it is explicitly enabled and I don't
> > > think many drivers actually program it (although we have some sort of
> > > API for it at least for Intel LPSS devices).
> > 
> > I couldn't find anything in the spec that suggests LTR should be an
> > issue.  I'm just grasping at straws here.
> > 
> > There's obviously *something* we're doing wrong because ASPM worked
> > before suspend, so we should be able to get it to work after resume.
> > 
> > Could we learn anything by dumping config space of the problem devices
> > before/after the suspend/resume and comparing them?
> > 
> > If we could figure out a difference between Werner's working TUXEDO
> > and Thomas' non-working TUXEDO, that might be a hint, too.
> > 
> > > If you have suggestions, I'm all open. If I understand you would like
> > > this to be done like:
> > > 
> > >    - Drop the denylist
> > >    - Add back the suspend/restore of L1SS
> > >    - Ask everyone in this thread to try it out
> > > 
> > > I can do that no problem but I guess that for the TUXEDO one (Thomas')
> > > it probably is going to fail still.
> > 
> > Right, without the denylist, I expect Thomas' TUXEDO to fail, but I
> > still hope we can figure out why.  If we just keep it on the denylist,
> > that system will suffer from more power consumption than necessary,
> > but only after suspend/resume.
> > 
> > A denylist seems like the absolute last resort.  In this case we don't
> > know about anything *wrong* with those platforms; all we know is that
> > our resume path doesn't work.  It's likely that it fails on other
> > platforms we haven't heard about, too.
> > 
> > Bjorn
> 
> The best guess from Mika and David was a firmware issue, but I run the 
> same Firmware revision as Werner. I even reflashed the Firmware, but 
> that did not change anything:
> 
> Quoting David Box:
>  > I agree that we should pursue an exception for your system. This is
>  > looking like a firmware bug. One thing we did notice in the turbostat
>  > results is your IRTL (Interrupt Response Time Limit) values are bogus:
>  >
>  > cpu6: MSR_PKGC3_IRTL: 0x0000884e (valid, 79872 ns)
>  > cpu6: MSR_PKGC6_IRTL: 0x00008000 (valid, 0 ns)
>  > cpu6: MSR_PKGC7_IRTL: 0x00008000 (valid, 0 ns)
>  > cpu6: MSR_PKGC8_IRTL: 0x00008000 (valid, 0 ns)
>  > cpu6: MSR_PKGC9_IRTL: 0x00008000 (valid, 0 ns)
>  > cpu6: MSR_PKGC10_IRTL: 0x00008000 (valid, 0 ns)
>  >
>  > This is despite the PKGC configuration register showing that all
>  > states are enabled:
>  >
>  > cpu6: MSR_PKG_CST_CONFIG_CONTROL: 0x1e008008 (UNdemote-C3, 
> UNdemote-C1, demote-
> C3, demote-C1, locked, pkg-cstate-limit=8 (unlimited))
>  >
>  > Firmware sets this.
> 
> Since I can't currently flash modified firmware on this computer, can 
> those values be overridden from userspace?
> 





[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux