Re: [PATCH v8 2/2] PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream device

Bjorn Helgaas <helgaas@xxxxxxxxxx> · Thu, 11 Jan 2024 14:54:04 -0600

On Thu, Jan 11, 2024 at 01:20:48PM -0500, Jim Quinlan wrote:
> On Thu, Jan 11, 2024 at 12:28 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > On Mon, Nov 13, 2023 at 01:56:06PM -0500, Jim Quinlan wrote:

> > > Previously, this driver always set the mode to "no-l1ss", as almost all
> > > STB/CM boards operate in this mode.  But now there is interest in
> > > activating L1SS power savings from STB/CM customers, which requires "aspm"
> > > mode.
> >
> > I think this should read "default" mode, not "aspm" mode, since "aspm"
> > is not a mode implemented by this patch, right?
> 
> Correct.

Thanks, I changed that locally.

> > > In addition, a bug was filed for RPi4 CM platform because most
> > > devices did not work in "no-l1ss" mode.
> >
> > I think this refers to bug 217276, mentioned below?
> 
> I guess you are saying I should put a footnote marker there.

I added a hint here.

> > > Note: Since L1 substates are now possible, a modification was made
> > > regarding an internal bus timeout: During long periods of the PCIe RC HW
> > > being in an L1SS sleep state, there may be a timeout on an internal bus
> > > access, even though there may not be any PCIe access involved.  Such a
> > > timeout will cause a subsequent CPU abort.
> >
> > This sounds scary.  If a NIC is put in L1.2, does this mean will we
> > see this CPU abort if there's no traffic for a long time?  What is
> > needed to avoid the CPU abort?
> 
> I don't think this happens in normal practice as there are a slew
> of low-level TLPs and LTR messages that are sent on a regular
> basis.

OK, I'll have to take your word for this.  I don't know enough about
PCIe to know what sort of periodic transmissions are required when a
device is idle.

LTR messages are required when endpoint service requirements change,
but I wouldn't expect those if the device is idle.

> The only time this timeout occured is when  a major customer
> was doing a hack: IIRC, their endpoint device has to reboot itself
> after link-up and driver probe,  so it goes into L1.2 to execute
> this to reboot and while doing so the connection is completely
> silent.

> > What does this mean for users?  L1SS is designed for long periods of
> > the device being idle, so this leaves me feeling that using L1SS is
> > unsafe in general.  Hopefully this impression is unwarranted, and all
> > we need is some clarification here.
> 
> I don't think it will affect most users, if any.

I'll try to get this into -next today or tomorrow.

Bjorn