Re: [PATCH v2] PCI: vmd: Enable PCI PM's L1 substates of remapped PCIe Root Port and NVMe

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Feb 06, 2024 at 01:25:29PM -0800, David E. Box wrote:
> On Mon, 2024-02-05 at 15:05 -0800, David E. Box wrote:
> > On Mon, 2024-02-05 at 16:42 -0600, Bjorn Helgaas wrote:
> > > On Mon, Feb 05, 2024 at 11:37:16AM -0800, David E. Box wrote:
> > > > On Fri, 2024-02-02 at 18:05 -0600, Bjorn Helgaas wrote:
> > > > > On Fri, Feb 02, 2024 at 03:11:12PM +0800, Jian-Hong Pan wrote:
> > > > ...
> > > 
> > > > > > @@ -775,6 +773,14 @@ static int vmd_pm_enable_quirk(struct pci_dev
> > > > > > *pdev,
> > > > > > void *userdata)
> > > > > >         pci_write_config_dword(pdev, pos + PCI_LTR_MAX_SNOOP_LAT,
> > > > > > ltr_reg);
> > > > > >         pci_info(pdev, "VMD: Default LTR value set by driver\n");
> > > > > 
> > > > > You're not changing this part, and I don't understand exactly how LTR
> > > > > works, but it makes me a little bit queasy to read "set the LTR value
> > > > > to the maximum required to allow the deepest power management
> > > > > savings" and then we set the max snoop values to a fixed constant.
> > > > > 
> > > > > I don't think the goal is to "allow the deepest power savings"; I
> > > > > think it's to enable L1.2 *when the device has enough buffering to
> > > > > absorb L1.2 entry/exit latencies*.
> > > > > 
> > > > > The spec (PCIe r6.0, sec 7.8.2.2) says "Software should set this to
> > > > > the platform's maximum supported latency or less," so it seems like
> > > > > that value must be platform-dependent, not fixed.
> > > > > 
> > > > > And I assume the "_DSM for Latency Tolerance Reporting" is part of the
> > > > > way to get those platform-dependent values, but Linux doesn't actually
> > > > > use that yet.
> > > > 
> > > > This may indeed be the best way but we need to double check with our
> > > > BIOS folks.  AFAIK BIOS writes the LTR values directly so there
> > > > hasn't been a need to use this _DSM. But under VMD the ports are
> > > > hidden from BIOS which is why we added it here. I've brought up the
> > > > question internally to find out how Windows handles the DSM and to
> > > > get a recommendation from our firmware leads.
> > > 
> > > We want Linux to be able to program LTR itself, don't we?  We
> > > shouldn't have to rely on firmware to do it.  If Linux can't do
> > > it, hot-added devices aren't going to be able to use L1.2,
> > > right?
> > 
> > Agreed. We just want to make sure we are not conflicting with what
> > BIOS may be doing.
> 
> So the feedback is to run the _DSM and just overwrite any BIOS
> values. Looking up the _DSM I saw there was an attempt to upstream
> this 4 years ago [1]. I'm not sure why the effort stalled but we can
> pick up this work again.
> 
> https://patchwork.kernel.org/project/linux-pci/patch/20201015080311.7811-1-puranjay12@xxxxxxxxx/

There was a PCI SIG discussion about this a few years ago that never
really seemed to get resolved:
https://members.pcisig.com/wg/PCIe-Protocol/mail/thread/35064

Unfortunately that discussion is not public, but the summary is:

  Q: How is the LTR_L1.2_THRESHOLD value determined?

     PCIe r5.0, sec 5.5.4, says the same value must be programmed into
     both Ports.

     A: As noted in sec 5.5.4, the value is determined primarily by
	the amount of time it will take to re-establish the common
	mode bias on the AC coupling caps, and it is assumed that the
	BIOS knows this.

  Q: How are the LTR Max Snoop values determined?

     PCI Firmware r3.3, sec 4.6.6, says the LTR _DSM reports the max
     values for each Downstream Port embedded in the platform, and the
     OS should calculate latencies along the path between each
     Downstream Port and any Upstream Port (Switch Upstream Port or
     Endpoint).

     Of course, Switches not embedded in the platform (e.g., external
     Thunderbolt hierarchies) will not have this _DSM, but I assume
     they should contribute to this sum?

     A: The fundamental problem is that there is no practical way for
	software to discover the AC coupling capacitor sizes and
	common mode bias circuit impedance.

	Software could compute conservative values, but they would
	likely be 10x worse than typical, so the L1.2 exit latency
	would be significantly longer than actually required to be.

	The interoperability issues here were understood when
	designing L1 Substates, but no viable solution was found.

So the main reason Puranjay's work got stalled is that I didn't feel
confident enough that we understood how to do this, especially for
external devices.

It would be great if somebody *did* feel confident about interpreting
and implementing all this.

Bjorn




[Index of Archives]     [DMA Engine]     [Linux Coverity]     [Linux USB]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [Greybus]

  Powered by Linux