Re: [PATCH] PCI: Re-enable downstream port LTR if it was previously enabled

Mika Westerberg <mika.westerberg@xxxxxxxxxxxxxxx> · Mon, 18 Jan 2021 14:58:09 +0200

On Fri, Jan 15, 2021 at 06:36:06PM -0600, Bjorn Helgaas wrote:
> On Fri, Jan 15, 2021 at 11:00:21AM +0200, Mika Westerberg wrote:
> > On Thu, Jan 14, 2021 at 06:10:07PM -0600, Bjorn Helgaas wrote:
> > > On Thu, Jan 14, 2021 at 04:47:24PM +0300, Mika Westerberg wrote:
> > > > PCIe r5.0, sec 7.5.3.16 says that the downstream ports must reset the
> > > > LTR enable bit if the link goes down (port goes DL_Down status). Now, if
> > > > we had LTR previously enabled and the PCIe endpoint gets hot-removed and
> > > > then hot-added back the ->ltr_path of the downstream port is still set
> > > > but the port now does not have the LTR enable bit set anymore.
> > > 
> > > IIRC LTR is only needed for L1.2, and of course the LTR Capability
> > > (Max Snoop/No-Snoop Latency registers) and the L1 PM Substates
> > > Capability (LTR_L1.2_THRESHOLD) must be programmed before enabling
> > > LTR.  For the bridge, I guess we're assuming those were programmed
> > > before the hot-remove, and they remain valid after the hot-add.
> > > 
> > > But what about the endpoint that we hot-added?  How do we program its
> > > LTR and L1 PM Substates Capabilities?  I know we have
> > > aspm_calc_l1ss_info() for L1 PM Substates, but I really don't trust
> > > it, and I don't think we do anything at all to program the LTR
> > > Capability.
> > 
> > True - we don't. However, enabling the LTR bit here for the endpoint
> > (and for the bridges all the way up to the root port) makes the endpoint
> > to report that there is no LTR requirement and that allows the SoC to do
> > some PM optimizations or so.
> > 
> > We actually see that if this is not re-programmed like this on a Tiger
> > Lake based ChromeBook S0ix fails (S0ix is Intel low power idle state).
> 
> So if we set PCI_EXP_DEVCTL2_LTR_EN for the bridge and also for the
> hot-added endpoint, S0ix works.  But we never get to the S0ix state if
> we don't set those bits?

Yes, correct.

> And we never actually set the Max Snoop/No-Snoop Latency registers in
> the LTR Capability, so they should power up as zeroes.  IIRC, that is
> the most conservative setting (PCIe r5.0, sec 6.18 says "all 0's
> indicates that the device will be impacted by any delay and that the
> best possible service is requested").
> 
> So I *guess* it makes some sort of sense to enable LTR in that
> situation, although I wish we could set the content of the messages
> before enabling them.

Right.

We actually have a bus agnostic API for that in PM QoS that requires the
PCI core to implement dev->power.set_latency_tolerance() and then
drivers can take advantage of this. However, it does not make much sense
to implement it without an actual user (driver).

> > > I used to think the LTR _DSM was a way to help us program the LTR
> > > Capability, and Puranjay did a nice job implementing support for it
> > > [1].  But I now think that _DSM doesn't give us enough information
> > > (and of course it doesn't help at all for non-ACPI systems or for
> > > hierarchies not integrated on the system board), so I didn't merge
> > > Puranjay's work.
> > > 
> > > I tried to have some discussion in the PCI SIG about this, but it
> > > never really went anywhere.  Here's my basic question, just for the
> > > archives:
> > > 
> > >   I think the LTR capability Max Snoop registers could also use some
> > >   clarification.  The base spec says "Software should set this to the
> > >   platform's maximum supported latency or less."  I assume this
> > >   platform data is supposed to come from the ACPI LTR _DSM.  The
> > >   firmware spec says software should sum the latencies along the path
> > >   between each downstream port (I wonder if this should say "Root
> > >   Port"?) and an endpoint that supports LTR.  Switches not embedded in
> > >   the platform will not have this _DSM, but I assume they contribute
> > >   to this sum.  But I don't know *what* they contribute.
> > > 
> > 
> > That's a fair question :)
> > 
> > I personally think that the driver for the specific hardware should know
> > what is the latency tolerance for the device when it is doing different
> > "tasks".
> > 
> > > > For this reason check if the bridge upstrea had LTR enabled set
> > > > previously and re-enable it before enabling LTR for the endpoint.
> > > 
> > > s/upstrea/upstream/
> > > s/enabled set/enabled/
> > 
> > Thanks, I'll fix those.
> > 
> > > Seems like there could be more things in the upstream bridge that need
> > > to be reprogrammed when the link comes back up (MPS, Common Clock
> > > Configuration, etc?).
> > > 
> > > I don't see anything in the spec about link status affecting MPS, but
> > > if we hot-removed a device that supported 4KB MPS and hot-added one
> > > that only support 128B, we might need more extensive reconfiguration.
> > > I haven't checked; maybe that's already covered?
> > 
> > It looks like that's covered in pcie_find_smpss(). It limits the MPS to
> > 128 if there are hotplug bridges in the topology. Assuming I read it
> > correctly.
> 
> OK, and it looks like that's called in the hot-add path:
> 
>   pciehp_configure_device
>     pcie_bus_configure_settings
>       pcie_find_smpss
> 
> so that's good.  I forgot that we had that path.
> 
> > > I think Common Clock Config also depends on characteristics of the
> > > hot-added device, so we might need to take a look at that, too.
> > 
> > I think pcie_aspm_configure_common_clock() takes care of that already.
> > It programs both ends of the link when a device is being added.
> 
> Yep, right.
> 
> > > If it turns out that we need to do more to the upstream bridge than
> > > just this LTR thing, I wonder if we should pull it out to some kind of
> > > "reconfig bridge" function so it's not buried in several random
> > > places.
> > 
> > It seems that at the moment it is only the LTR thing that needs to be
> > reconfigured as others looks like they are covered by their
> > corresponding "enable" functions. Not sure if it is better to move those
> > to "reconfig bridge" function because it might be harder to follow if it
> > is not close to the code that enables the feature in the first place.
> > 
> > But I can do that if you still think it is better.
> 
> No, I don't think it's worth doing that.

OK, thanks.