On Tue, Feb 12, 2019 at 05:02:30PM -0500, Lyude Paul wrote: > On a very specific subset of ThinkPad P50 SKUs, particularly ones that > come with a Quadro M1000M chip instead of the M2000M variant, the BIOS > seems to have a very nasty habit of not always resetting the secondary > Nvidia GPU between full reboots if the laptop is configured in Hybrid > Graphics mode. The reason for this happening is unknown, but the > following steps and possibly a good bit of patience will reproduce the > issue: > > 1. Boot up the laptop normally in Hybrid graphics mode > 2. Make sure nouveau is loaded and that the GPU is awake > 2. Allow the nvidia GPU to runtime suspend itself after being idle > 3. Reboot the machine, the more sudden the better (e.g sysrq-b may help) > 4. If nouveau loads up properly, reboot the machine again and go back to > step 2 until you reproduce the issue > > This results in some very strange behavior: the GPU will > quite literally be left in exactly the same state it was in when the > previously booted kernel started the reboot. This has all sorts of bad > sideaffects: for starters, this completely breaks nouveau starting with a > mysterious EVO channel failure that happens well before we've actually > used the EVO channel for anything: > > nouveau 0000:01:00.0: disp: chid 0 mthd 0000 data 00000400 00001000 > 00000002 > ... > So to do this, we add a new pci quirk using > DECLARE_PCI_FIXUP_CLASS_FINAL that will be invoked before the PCI probe > at boot finishes. From there, we check to make sure that this is indeed > the specific P50 variant of this GPU. We also make sure that the GPU PCI > device is advertising NoReset- in order to prevent us from trying to > reset the GPU when the machine is in Dedicated graphics mode (where the > GPU being initialized by the BIOS is normal and expected). Finally, we > try mapping the MMIO space for the GPU which should only work if the GPU > is actually active in D0 mode. We can then read the magic 0x2240c > register on the GPU, which will have bit 1 set if the GPU's firmware has > already been posted during a previous boot. Once we've confirmed all of > this, we reset the PCI device and re-disable it - bringing the GPU back > into a healthy state. > > Signed-off-by: Lyude Paul <lyude@xxxxxxxxxx> > Cc: nouveau@xxxxxxxxxxxxxxxxxxxxx > Cc: dri-devel@xxxxxxxxxxxxxxxxxxxxx > Cc: Karol Herbst <kherbst@xxxxxxxxxx> > Cc: Ben Skeggs <skeggsb@xxxxxxxxx> > Cc: stable@xxxxxxxxxxxxxxx Applied to pci/misc for v5.2, thanks! > --- > drivers/pci/quirks.c | 65 ++++++++++++++++++++++++++++++++++++++++++++ > 1 file changed, 65 insertions(+) > > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c > index b0a413f3f7ca..948492fda8bf 100644 > --- a/drivers/pci/quirks.c > +++ b/drivers/pci/quirks.c > @@ -5117,3 +5117,68 @@ SWITCHTEC_QUIRK(0x8573); /* PFXI 48XG3 */ > SWITCHTEC_QUIRK(0x8574); /* PFXI 64XG3 */ > SWITCHTEC_QUIRK(0x8575); /* PFXI 80XG3 */ > SWITCHTEC_QUIRK(0x8576); /* PFXI 96XG3 */ > + > +/* > + * On certain Lenovo Thinkpad P50 SKUs, specifically those with a Nvidia > + * Quadro M1000M, the BIOS will occasionally make the mistake of not resetting > + * the nvidia GPU between reboots if the system is configured to use hybrid > + * graphics mode. This results in the GPU being left in whatever state it was > + * in during the previous boot which causes spurious interrupts from the GPU, > + * which in turn cause us to disable the wrong IRQs and end up breaking the > + * touchpad. Unsurprisingly, this also completely breaks nouveau. > + * > + * Luckily, it seems a simple reset of the PCI device for the nvidia GPU > + * manages to bring the GPU back into a clean state and fix all of these > + * issues. Additionally since the GPU will report NoReset+ when the machine is > + * configured in Dedicated display mode, we don't need to worry about > + * accidentally resetting the GPU when it's supposed to already be > + * initialized. > + */ > +static void > +quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot(struct pci_dev *pdev) > +{ > + void __iomem *map; > + int ret; > + > + if (pdev->subsystem_vendor != PCI_VENDOR_ID_LENOVO || > + pdev->subsystem_device != 0x222e || > + !pdev->reset_fn) > + return; > + > + /* > + * If we can't enable the device's mmio space, it's probably not even > + * initialized. This is fine, and means we can just skip the quirk > + * entirely. > + */ > + if (pci_enable_device_mem(pdev)) { > + pci_dbg(pdev, "Can't enable device mem, no reset needed\n"); > + return; > + } > + > + /* Taken from drivers/gpu/drm/nouveau/engine/device/base.c */ > + map = ioremap(pci_resource_start(pdev, 0), 0x102000); > + if (!map) { > + pci_err(pdev, "Can't map MMIO space, this is probably very bad\n"); > + goto out_disable; > + } > + > + /* > + * Be extra careful, and make sure that the GPU firmware is posted > + * before trying a reset > + */ > + if (ioread32(map + 0x2240c) & 0x2) { > + pci_info(pdev, > + FW_BUG "GPU left initialized by EFI, resetting\n"); > + ret = pci_reset_function(pdev); > + if (ret < 0) > + pci_err(pdev, "Failed to reset GPU: %d\n", ret); > + } > + > + iounmap(map); > +out_disable: > + pci_disable_device(pdev); > +} > + > +DECLARE_PCI_FIXUP_CLASS_FINAL(PCI_VENDOR_ID_NVIDIA, 0x13b1, > + PCI_CLASS_DISPLAY_VGA, 8, > + quirk_lenovo_thinkpad_p50_nvgpu_survives_reboot); > -- > 2.20.1 >