Re: [Nouveau] [PATCH v2 4/4] pci: save the boot pcie link speed and restore it on fini

Ilia Mirkin <imirkin@xxxxxxxxxxxx> · Tue, 21 May 2019 09:50:55 -0400

On Tue, May 21, 2019 at 9:29 AM Karol Herbst <kherbst@xxxxxxxxxx> wrote:
>
> On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> >
> > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote:
> > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote:
> > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote:
> > > > > Apperantly things go south if we suspend the device with a different PCIE
> > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107.
> > > > >
> > > > > This all looks like some bug inside the pci subsystem and I would prefer a
> > > > > fix there instead of nouveau, but maybe there is no real nice way of doing
> > > > > that outside of drivers?
> > > >
> > > > I agree it would be nice to fix this in the PCI core if that's
> > > > feasible.
> > > >
> > > > It looks like this driver changes the PCIe link speed using some
> > > > device-specific mechanism.  When we suspend, we put the device in
> > > > D3cold, so it loses all its state.  When we resume, the link probably
> > > > comes up at the boot speed because nothing did that device-specific
> > > > magic to change it, so you probably end up with the link being slow
> > > > but the driver thinking it's configured to be fast, and maybe that
> > > > combination doesn't work.
> > > >
> > > > If it requires something device-specific to change that link speed, I
> > > > don't know how to put that in the PCI core.  But maybe I'm missing
> > > > something?
> > > >
> > > > Per the PCIe spec (r4.0, sec 1.2):
> > > >
> > > >   Initialization – During hardware initialization, each PCI Express
> > > >   Link is set up following a negotiation of Lane widths and frequency
> > > >   of operation by the two agents at each end of the Link. No firmware
> > > >   or operating system software is involved.
> > > >
> > > > I have been assuming that this means device-specific link speed
> > > > management is out of spec, but it seems pretty common that devices
> > > > don't come up by themselves at the fastest possible link speed.  So
> > > > maybe the spec just intends that devices can operate at *some* valid
> > > > speed.
> > >
> > > I would expect that devices kind of have to figure out what they can
> > > operate on and the operating system kind of just checks what the
> > > current state is and doesn't try to "restore" the old state or
> > > something?
> >
> > The devices at each end of the link negotiate the width and speed of
> > the link.  This is done directly by the hardware without any help from
> > the OS.
> >
> > The OS can read the current link state (Current Link Speed and
> > Negotiated Link Width, both in the Link Status register).  The OS has
> > very little control over that state.  It can't directly restore the
> > state because the hardware has to negotiate a width & speed that
> > result in reliable operation.
> >
> > > We don't do anything in the driver after the device was suspended. And
> > > the 0x88000 is a mirror of the PCI config space, but we also got some
> > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff
> > > essentially. I have no idea how much of this is part of the actual pci
> > > standard and how much is driver specific. But the driver also wants to
> > > have some control over the link speed as it's tight to performance
> > > states on GPU.
> >
> > As far as I'm aware, there is no generic PCIe way for the OS to
> > influence the link width or speed.  If the GPU driver needs to do
> > that, it would be via some device-specific mechanism.
> >
> > > The big issue here is just, that the GPU boots with 8.0, some on-gpu
> > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least
> > > the communication with the controller is broken. But if we set it to
> > > the boot speed, resuming the GPU just works. So my assumption was,
> > > that _something_ (might it be the controller or the pci subsystem)
> > > tries to force to operate on an invalid link speed and because the
> > > bridge controller is actually powered down as well (as all children
> > > are in D3cold) I could imagine that something in the pci subsystem
> > > actually restores the state which lets the controller fail to
> > > establish communication again?
> >
> >   1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s
> >      without OS/driver intervention.
> >
> >   2) Some mechanism reduces link speed to 2.5 GT/s.  This probably
> >      requires driver intervention or at least some ACPI method.
> >
>
> there is no driver intervention and Nouveau doesn't care at all. It's
> all done on the GPU. We just upload a script and some firmware on to
> the GPU. The script runs then on the PMU inside the GPU and this
> script also causes changing the PCIe link settings. But from a Nouveau
> point of view we don't care about the link before or after that script
> was invoked. Also there is no ACPI method involved.
>
> But if there is something we should notify pci core about, maybe
> that's something we have to do then?
>
> >   3) Suspend puts GPU into D3cold (powered off).
> >
> >   4) Resume restores GPU to D0, and the Port and GPU hardware again
> >      negotiate 8.0 GT/s without OS/driver intervention, just like at
> >      initial boot.
> >
>
> No, that negotiation fails apparently as any attempt to read anything
> from the device just fails inside pci core. Or something goes wrong
> when resuming the bridge controller.
>
> >   5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at
> >      8.0 GT/s.
> >
>
> what is actually meant by "driver" here? The pci subsystem or Nouveau?
>
> > Without knowing more about the transition to 2.5 GT/s, I can't guess
> > why the GPU wouldn't work after resume.  From a PCIe point of view,
> > the link is supposed to work and the device should be reachable
> > independent of the link speed.  But maybe there's some weird
> > dependency between the GPU and the driver here.
> >
>
> but the device isn't reachable at all, not even from the pci
> subsystem. All reads fail/return a default error value (0xffffffff).
>
> > It sounds like things work if you return to 8.0 GT/s before suspend,
> > things work.  That would make sense to me because then the driver's
> > idea of the link state after resume would match the actual state.
> >
>
> depends on what is meant by the driver here. Inside Nouveau we don't
> care one bit about the current link speed, so I assume you mean
> something inside the pci core code?
>
> > But I don't see a way to deal with this in the PCI core.  The PCI core
> > does save and restore most of the architected config space around
> > suspend/resume, but since this appears to be a device-specific thing,
> > the PCI core would have no idea how to save/restore it.
> >
>
> if we assume that the negotiation on a device level works as intended,
> then I would expect this to be a pci core issue, which might actually
> be not fixable there. But if it's not, then we would have to put
> something like that inside the runpm documentation to tell drivers
> they have to do something about it.
>
> But again, for me it just sounds like the negotiation on the device
> level fails or something inside pci core messes it up.

Bjorn -- nouveau has a way of requesting that the GPU change PCIe
settings. It sets the PCIe version to the max version (esp older GPUs
tended to boot as PCIe 1.0, and had to be set to 2.0/3.0 "by hand"),
and then the link speed is adjusted based on the perf level settings
by writing to a PCI config-ish mmio space -- however on the GPUs that
Karol is talking about, we can't do the perf level adjustments, so
nouveau never touches the speed. (Does it touch the PCIe version? Not
100% sure ... Karol?) In this case, it sounds like it's firmware
running on the GPU which is doing this (probably using the exact same
mechanism nouveau would -- those internal engines also have access to
the mmio space).

Perhaps there's a way to capture PCI config space of both the GPU and
its link partner, to see if there's anything obviously wrong? (But
even if there is, doesn't sound like we have too much recourse...)
>From the sounds of it, the two link partners disagree on settings
somehow and don't establish a proper link.

  -ilia