Re: [PATCH v8 2/2] PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream device

Jim Quinlan <james.quinlan@xxxxxxxxxxxx> · Thu, 11 Jan 2024 13:20:48 -0500

On Thu, Jan 11, 2024 at 12:28 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
>
> On Mon, Nov 13, 2023 at 01:56:06PM -0500, Jim Quinlan wrote:
> > The Broadcom STB/CM PCIe HW core, which is also used in RPi SOCs, must be
> > deliberately set by the PCIe RC HW into one of three mutually exclusive
> > modes:
> >
> > "safe" -- No CLKREQ# expected or required, refclk is always provided.  This
> >     mode should work for all devices but is not be capable of any refclk
> >     power savings.
> >
> > "no-l1ss" -- CLKREQ# is expected to be driven by the downstream device for
> >     CPM and ASPM L0s and L1.  Provides Clock Power Management, L0s, and L1,
> >     but cannot provide L1 substate (L1SS) power savings. If the downstream
> >     device connected to the RC is L1SS capable AND the OS enables L1SS, all
> >     PCIe traffic may abruptly halt, potentially hanging the system.
> >
> > "default" -- Bidirectional CLKREQ# between the RC and downstream device.
> >     Provides ASPM L0s, L1, and L1SS, but not compliant to provide Clock
> >     Power Management; specifically, may not be able to meet the T_CLRon max
> >     timing of 400ns as specified in "Dynamic Clock Control", section
> >     3.2.5.2.2 of the PCIe Express Mini CEM 2.1 specification.  This
> >     situation is atypical and should happen only with older devices.
> >
> > Previously, this driver always set the mode to "no-l1ss", as almost all
> > STB/CM boards operate in this mode.  But now there is interest in
> > activating L1SS power savings from STB/CM customers, which requires "aspm"
> > mode.
>
> I think this should read "default" mode, not "aspm" mode, since "aspm"
> is not a mode implemented by this patch, right?

Correct.
>
>
> > In addition, a bug was filed for RPi4 CM platform because most
> > devices did not work in "no-l1ss" mode.
>
> I think this refers to bug 217276, mentioned below?

I guess you are saying I should put a footnote marker there.

>
>
> > Note that the mode is specified by the DT property "brcm,clkreq-mode".  If
> > this property is omitted, then "default" mode is chosen.
> >
> > Note: Since L1 substates are now possible, a modification was made
> > regarding an internal bus timeout: During long periods of the PCIe RC HW
> > being in an L1SS sleep state, there may be a timeout on an internal bus
> > access, even though there may not be any PCIe access involved.  Such a
> > timeout will cause a subsequent CPU abort.
>
> This sounds scary.  If a NIC is put in L1.2, does this mean will we
> see this CPU abort if there's no traffic for a long time?  What is
> needed to avoid the CPU abort?

I don't think this  happens in normal practice as there are a slew of
low-level TLPs
and LTR messages  that are sent on a regular basis.  The only time
this timeout occured
is when  a major customer was doing a hack: IIRC, their endpoint
device has to reboot itself after link-up and driver probe,  so it
goes into L1.2 to execute this to reboot
and while doing so the connection is completely silent.

>
> Rega
> What does this mean for users?  L1SS is designed for long periods of
> the device being idle, so this leaves me feeling that using L1SS is
> unsafe in general.  Hopefully this impression is unwarranted, and all
> we need is some clarification here.

I don't think it will affect most users, if any.

Regards,
Jim Quinlan
Broadcom STB/CM

>
> > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217276
> >
> > Signed-off-by: Jim Quinlan <james.quinlan@xxxxxxxxxxxx>
> > Tested-by: Florian Fainelli <florian.fainelli@xxxxxxxxxxxx>
> > ---
> >  drivers/pci/controller/pcie-brcmstb.c | 96 ++++++++++++++++++++++++---
> >  1 file changed, 86 insertions(+), 10 deletions(-)
> > ...
Attachment:
smime.p7s

Description: S/MIME Cryptographic Signature