On Thu, Jan 11, 2024 at 12:28 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > On Mon, Nov 13, 2023 at 01:56:06PM -0500, Jim Quinlan wrote: > > The Broadcom STB/CM PCIe HW core, which is also used in RPi SOCs, must be > > deliberately set by the PCIe RC HW into one of three mutually exclusive > > modes: > > > > "safe" -- No CLKREQ# expected or required, refclk is always provided. This > > mode should work for all devices but is not be capable of any refclk > > power savings. > > > > "no-l1ss" -- CLKREQ# is expected to be driven by the downstream device for > > CPM and ASPM L0s and L1. Provides Clock Power Management, L0s, and L1, > > but cannot provide L1 substate (L1SS) power savings. If the downstream > > device connected to the RC is L1SS capable AND the OS enables L1SS, all > > PCIe traffic may abruptly halt, potentially hanging the system. > > > > "default" -- Bidirectional CLKREQ# between the RC and downstream device. > > Provides ASPM L0s, L1, and L1SS, but not compliant to provide Clock > > Power Management; specifically, may not be able to meet the T_CLRon max > > timing of 400ns as specified in "Dynamic Clock Control", section > > 3.2.5.2.2 of the PCIe Express Mini CEM 2.1 specification. This > > situation is atypical and should happen only with older devices. > > > > Previously, this driver always set the mode to "no-l1ss", as almost all > > STB/CM boards operate in this mode. But now there is interest in > > activating L1SS power savings from STB/CM customers, which requires "aspm" > > mode. > > I think this should read "default" mode, not "aspm" mode, since "aspm" > is not a mode implemented by this patch, right? Correct. > > > > In addition, a bug was filed for RPi4 CM platform because most > > devices did not work in "no-l1ss" mode. > > I think this refers to bug 217276, mentioned below? I guess you are saying I should put a footnote marker there. > > > > Note that the mode is specified by the DT property "brcm,clkreq-mode". If > > this property is omitted, then "default" mode is chosen. > > > > Note: Since L1 substates are now possible, a modification was made > > regarding an internal bus timeout: During long periods of the PCIe RC HW > > being in an L1SS sleep state, there may be a timeout on an internal bus > > access, even though there may not be any PCIe access involved. Such a > > timeout will cause a subsequent CPU abort. > > This sounds scary. If a NIC is put in L1.2, does this mean will we > see this CPU abort if there's no traffic for a long time? What is > needed to avoid the CPU abort? I don't think this happens in normal practice as there are a slew of low-level TLPs and LTR messages that are sent on a regular basis. The only time this timeout occured is when a major customer was doing a hack: IIRC, their endpoint device has to reboot itself after link-up and driver probe, so it goes into L1.2 to execute this to reboot and while doing so the connection is completely silent. > > Rega > What does this mean for users? L1SS is designed for long periods of > the device being idle, so this leaves me feeling that using L1SS is > unsafe in general. Hopefully this impression is unwarranted, and all > we need is some clarification here. I don't think it will affect most users, if any. Regards, Jim Quinlan Broadcom STB/CM > > > Link: https://bugzilla.kernel.org/show_bug.cgi?id=217276 > > > > Signed-off-by: Jim Quinlan <james.quinlan@xxxxxxxxxxxx> > > Tested-by: Florian Fainelli <florian.fainelli@xxxxxxxxxxxx> > > --- > > drivers/pci/controller/pcie-brcmstb.c | 96 ++++++++++++++++++++++++--- > > 1 file changed, 86 insertions(+), 10 deletions(-) > > ...
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature