Re: [PATCH] drm/amdgpu: Don't enable LTR if not supported

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 





On 9/8/2022 11:27 PM, Bjorn Helgaas wrote:
On Thu, Sep 08, 2022 at 04:42:38PM +0000, Lazar, Lijo wrote:
I am not sure if ASPM settings can be generalized by PCIE core.
Performance vs Power savings when ASPM is enabled will require some
additional tuning and that will be device specific.

Can you elaborate on this?  In the universe of drivers, very few do
their own ASPM configuration, and it's usually to work around hardware
defects, e.g., L1 doesn't work on some e1000e devices, L0s doesn't
work on some iwlwifi devices, etc.

The core does know how to configure all the ASPM features defined in
the PCIe spec, e.g., L0s, L1, L1.1, L1.2, and LTR.

In some of the other ASICs, this programming is done in VBIOS/SBIOS
firmware. Having it in driver provides the advantage of additional
tuning without forcing a VBIOS upgrade.

I think it's clearly the intent of the PCIe spec that ASPM
configuration be done by generic code.  Here are some things that
require a system-level view, not just an individual device view:

   - L0s, L1, and L1 Substates cannot be enabled unless both ends
     support it (PCIe r6.0, secs 5.4.1.4, 7.5.3.7, 5.5.4).

   - Devices advertise the "Acceptable Latency" they can accept for
     transitions from L0s or L1 to L0, and the actual latency depends
     on the "Exit Latencies" of all the devices in the path to the Root
     Port (sec 5.4.1.3.2).

   - LTR (required by L1.2) cannot be enabled unless it is already
     enabled in all upstream devices (sec 6.18).  This patch relies on
     "ltr_path", which works now but relies on the PCI core never
     reconfiguring the upstream path.

There might be amdgpu-specific features the driver needs to set up,
but if drivers fiddle with architected features like LTR behind the
PCI core's back, things are likely to break.


The programming is mostly related to entry conditions and spec leaves it to implementation.

From r4.0 spec -
"
This specification does not dictate when a component with an Upstream Port must initiate a transition to the L1 state. The interoperable mechanisms for transitioning into and out of L1 are defined within this specification; however, the specific ASPM policy governing when to transition into L1 is left to the implementer.
...
Another approach would be for the Downstream device to initiate a transition to the L1 state once the Link has been idle in L0 for a set amount of time.
"

Some of the programming like below relates to timings for entry.

        def = data = RREG32_SOC15(NBIO, 0, regRCC_STRAP0_RCC_BIF_STRAP3);
data |= 0x5DE0 << RCC_BIF_STRAP3__STRAP_VLINK_ASPM_IDLE_TIMER__SHIFT; data |= 0x0010 << RCC_BIF_STRAP3__STRAP_VLINK_PM_L1_ENTRY_TIMER__SHIFT;
        if (def != data)
                WREG32_SOC15(NBIO, 0, regRCC_STRAP0_RCC_BIF_STRAP3, data);

Similarly for LTR, as it provides a dynamic mechanism to report tolerance while in L1 substates, the tolerance timings can be tuned through registers though there is a threshold.

Regardless, Alex is already checking with hardware design team on possible improvements.

Thanks,
Lijo

From: Alex Deucher <alexdeucher@xxxxxxxxx>
On Thu, Sep 8, 2022 at 12:12 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:

Do you know why the driver configures ASPM itself?  If the PCI core is
doing something wrong (and I'm sure it is, ASPM support is kind of a
mess), I'd much prefer to fix up the core where *all* drivers can
benefit from it.

This is the programming sequence we get from our hardware team and it
is used on both windows and Linux.  As far as I understand it windows
doesn't handle this in the core, it's up to the individual drivers to
enable it.  I'm not familiar with how this should be enabled
generically, but at least for our hardware, it seems to have some
variation compared to what is done in the PCI core due to stability,
etc. It seems to me that this may need asic specific implementations
for a lot of hardware depending on the required programming sequences.
E.g., various asics may need hardware workaround for bugs or platform
issues, etc.  I can ask for more details from our hardware team.

If the PCI core has stability issues, I want to fix them.  This
hardware may have its own stability issues, and I would ideally like
to have drivers use interfaces like pci_disable_link_state() to avoid
broken things.  Maybe we need new interfaces for more subtle kinds of
breakage.

Bjorn




[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux