On Tue, May 21, 2019 at 3:51 PM Ilia Mirkin <imirkin@xxxxxxxxxxxx> wrote: > > On Tue, May 21, 2019 at 9:29 AM Karol Herbst <kherbst@xxxxxxxxxx> wrote: > > > > On Tue, May 21, 2019 at 3:11 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > > > On Tue, May 21, 2019 at 12:30:38AM +0200, Karol Herbst wrote: > > > > On Mon, May 20, 2019 at 11:20 PM Bjorn Helgaas <helgaas@xxxxxxxxxx> wrote: > > > > > On Tue, May 07, 2019 at 10:12:45PM +0200, Karol Herbst wrote: > > > > > > Apperantly things go south if we suspend the device with a different PCIE > > > > > > link speed set than it got booted with. Fixes runtime suspend on my gp107. > > > > > > > > > > > > This all looks like some bug inside the pci subsystem and I would prefer a > > > > > > fix there instead of nouveau, but maybe there is no real nice way of doing > > > > > > that outside of drivers? > > > > > > > > > > I agree it would be nice to fix this in the PCI core if that's > > > > > feasible. > > > > > > > > > > It looks like this driver changes the PCIe link speed using some > > > > > device-specific mechanism. When we suspend, we put the device in > > > > > D3cold, so it loses all its state. When we resume, the link probably > > > > > comes up at the boot speed because nothing did that device-specific > > > > > magic to change it, so you probably end up with the link being slow > > > > > but the driver thinking it's configured to be fast, and maybe that > > > > > combination doesn't work. > > > > > > > > > > If it requires something device-specific to change that link speed, I > > > > > don't know how to put that in the PCI core. But maybe I'm missing > > > > > something? > > > > > > > > > > Per the PCIe spec (r4.0, sec 1.2): > > > > > > > > > > Initialization – During hardware initialization, each PCI Express > > > > > Link is set up following a negotiation of Lane widths and frequency > > > > > of operation by the two agents at each end of the Link. No firmware > > > > > or operating system software is involved. > > > > > > > > > > I have been assuming that this means device-specific link speed > > > > > management is out of spec, but it seems pretty common that devices > > > > > don't come up by themselves at the fastest possible link speed. So > > > > > maybe the spec just intends that devices can operate at *some* valid > > > > > speed. > > > > > > > > I would expect that devices kind of have to figure out what they can > > > > operate on and the operating system kind of just checks what the > > > > current state is and doesn't try to "restore" the old state or > > > > something? > > > > > > The devices at each end of the link negotiate the width and speed of > > > the link. This is done directly by the hardware without any help from > > > the OS. > > > > > > The OS can read the current link state (Current Link Speed and > > > Negotiated Link Width, both in the Link Status register). The OS has > > > very little control over that state. It can't directly restore the > > > state because the hardware has to negotiate a width & speed that > > > result in reliable operation. > > > > > > > We don't do anything in the driver after the device was suspended. And > > > > the 0x88000 is a mirror of the PCI config space, but we also got some > > > > PCIe stuff at 0x8c000 which is used by newer GPUs for gen3 stuff > > > > essentially. I have no idea how much of this is part of the actual pci > > > > standard and how much is driver specific. But the driver also wants to > > > > have some control over the link speed as it's tight to performance > > > > states on GPU. > > > > > > As far as I'm aware, there is no generic PCIe way for the OS to > > > influence the link width or speed. If the GPU driver needs to do > > > that, it would be via some device-specific mechanism. > > > > > > > The big issue here is just, that the GPU boots with 8.0, some on-gpu > > > > init mechanism decreases it to 2.5. If we suspend, the GPU or at least > > > > the communication with the controller is broken. But if we set it to > > > > the boot speed, resuming the GPU just works. So my assumption was, > > > > that _something_ (might it be the controller or the pci subsystem) > > > > tries to force to operate on an invalid link speed and because the > > > > bridge controller is actually powered down as well (as all children > > > > are in D3cold) I could imagine that something in the pci subsystem > > > > actually restores the state which lets the controller fail to > > > > establish communication again? > > > > > > 1) At boot-time, the Port and the GPU hardware negotiate 8.0 GT/s > > > without OS/driver intervention. > > > > > > 2) Some mechanism reduces link speed to 2.5 GT/s. This probably > > > requires driver intervention or at least some ACPI method. > > > > > > > there is no driver intervention and Nouveau doesn't care at all. It's > > all done on the GPU. We just upload a script and some firmware on to > > the GPU. The script runs then on the PMU inside the GPU and this > > script also causes changing the PCIe link settings. But from a Nouveau > > point of view we don't care about the link before or after that script > > was invoked. Also there is no ACPI method involved. > > > > But if there is something we should notify pci core about, maybe > > that's something we have to do then? > > > > > 3) Suspend puts GPU into D3cold (powered off). > > > > > > 4) Resume restores GPU to D0, and the Port and GPU hardware again > > > negotiate 8.0 GT/s without OS/driver intervention, just like at > > > initial boot. > > > > > > > No, that negotiation fails apparently as any attempt to read anything > > from the device just fails inside pci core. Or something goes wrong > > when resuming the bridge controller. > > > > > 5) Now the driver thinks the GPU is at 2.5 GT/s but it's actually at > > > 8.0 GT/s. > > > > > > > what is actually meant by "driver" here? The pci subsystem or Nouveau? > > > > > Without knowing more about the transition to 2.5 GT/s, I can't guess > > > why the GPU wouldn't work after resume. From a PCIe point of view, > > > the link is supposed to work and the device should be reachable > > > independent of the link speed. But maybe there's some weird > > > dependency between the GPU and the driver here. > > > > > > > but the device isn't reachable at all, not even from the pci > > subsystem. All reads fail/return a default error value (0xffffffff). > > > > > It sounds like things work if you return to 8.0 GT/s before suspend, > > > things work. That would make sense to me because then the driver's > > > idea of the link state after resume would match the actual state. > > > > > > > depends on what is meant by the driver here. Inside Nouveau we don't > > care one bit about the current link speed, so I assume you mean > > something inside the pci core code? > > > > > But I don't see a way to deal with this in the PCI core. The PCI core > > > does save and restore most of the architected config space around > > > suspend/resume, but since this appears to be a device-specific thing, > > > the PCI core would have no idea how to save/restore it. > > > > > > > if we assume that the negotiation on a device level works as intended, > > then I would expect this to be a pci core issue, which might actually > > be not fixable there. But if it's not, then we would have to put > > something like that inside the runpm documentation to tell drivers > > they have to do something about it. > > > > But again, for me it just sounds like the negotiation on the device > > level fails or something inside pci core messes it up. > > Bjorn -- nouveau has a way of requesting that the GPU change PCIe > settings. It sets the PCIe version to the max version (esp older GPUs > tended to boot as PCIe 1.0, and had to be set to 2.0/3.0 "by hand"), > and then the link speed is adjusted based on the perf level settings > by writing to a PCI config-ish mmio space -- however on the GPUs that > Karol is talking about, we can't do the perf level adjustments, so > nouveau never touches the speed. (Does it touch the PCIe version? Not > 100% sure ... Karol?) I think we only do it if the GPU comes up as v1, but that was mainly a tesla thing, saw it on Fermi a few times, but never on newer chips. And we also only do it if the pci->func->pcie.version callback was set (which we don't do on Pascal, and this is the gen where we have the runpm issue). > In this case, it sounds like it's firmware > running on the GPU which is doing this (probably using the exact same > mechanism nouveau would -- those internal engines also have access to > the mmio space). > > Perhaps there's a way to capture PCI config space of both the GPU and > its link partner, to see if there's anything obviously wrong? (But > even if there is, doesn't sound like we have too much recourse...) > From the sounds of it, the two link partners disagree on settings > somehow and don't establish a proper link. > > -ilia _______________________________________________ Nouveau mailing list Nouveau@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/nouveau