Hi, > -----Original Message----- > From: Intel-gfx <intel-gfx-bounces@xxxxxxxxxxxxxxxxxxxxx> On Behalf Of Chris Chiu > Sent: perjantai 21. toukokuuta 2021 7.02 > To: Rafael J. Wysocki <rafael@xxxxxxxxxx> > Cc: Brown, Len <len.brown@xxxxxxxxx>; Karol Herbst <kherbst@xxxxxxxxxx>; Linux > PM <linux-pm@xxxxxxxxxxxxxxx>; Linux PCI <linux-pci@xxxxxxxxxxxxxxx>; > Westerberg, Mika <mika.westerberg@xxxxxxxxx>; Rafael J. Wysocki > <rjw@xxxxxxxxxxxxx>; dri-devel <dri-devel@xxxxxxxxxxxxxxxxxxxxx>; Bjorn Helgaas > <bhelgaas@xxxxxxxxxx>; intel-gfx@xxxxxxxxxxxxxxxxxxxxx > Subject: Re: [Intel-gfx] NVIDIA GPU fallen off the bus after exiting s2idle > > On Thu, May 6, 2021 at 5:46 PM Rafael J. Wysocki <rafael@xxxxxxxxxx> wrote: > > > > On Tue, May 4, 2021 at 10:08 AM Chris Chiu <chris.chiu@xxxxxxxxxxxxx> wrote: > > > > > > Hi, > > > We have some Intel laptops (11th generation CPU) with NVIDIA GPU > > > suffering the same GPU falling off the bus problem while exiting > > > s2idle with external display connected. These laptops connect the > > > external display via the HDMI/DisplayPort on a USB Type-C interfaced > > > dock. If we enter and exit s2idle with the dock connected, the > > > NVIDIA GPU (confirmed on 10de:24b6 and 10de:25b8) and the PCIe port > > > can come back to D0 w/o problem. If we enter the s2idle, disconnect > > > the dock, then exit the s2idle, both external display and the panel > > > will remain with no output. The dmesg as follows shows the "nvidia > 0000:01:00.0: > > > can't change power state from D3cold to D0 (config space > > > inaccessible)" due to the following ACPI error [ 154.446781] [ > > > 154.446783] [ 154.446783] Initialized Local Variables for Method > > > [IPCS]: > > > [ 154.446784] Local0: 000000009863e365 <Obj> Integer > > > 00000000000009C5 [ 154.446790] [ 154.446791] Initialized Arguments > > > for Method [IPCS]: (7 arguments defined for method invocation) [ > > > 154.446792] Arg0: 0000000025568fbd <Obj> Integer 00000000000000AC [ > > > 154.446795] Arg1: 000000009ef30e76 <Obj> Integer 0000000000000000 [ > > > 154.446798] Arg2: 00000000fdf820f0 <Obj> Integer 0000000000000010 [ > > > 154.446801] Arg3: 000000009fc2a088 <Obj> Integer 0000000000000001 [ > > > 154.446804] Arg4: 000000003a3418f7 <Obj> Integer 0000000000000001 [ > > > 154.446807] Arg5: 0000000020c4b87c <Obj> Integer 0000000000000000 [ > > > 154.446810] Arg6: 000000008b965a8a <Obj> Integer 0000000000000000 [ > > > 154.446813] [ 154.446815] ACPI Error: Aborting method \IPCS due to > > > previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446824] ACPI > > > Error: Aborting method \MCUI due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446829] ACPI > > > Error: Aborting method \SPCX due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446835] ACPI > > > Error: Aborting method \_SB.PC00.PGSC due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446841] ACPI > > > Error: Aborting method \_SB.PC00.PGON due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446846] ACPI > > > Error: Aborting method \_SB.PC00.PEG1.NPON due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446852] ACPI > > > Error: Aborting method \_SB.PC00.PEG1.PG01._ON due to previous error > > > (AE_AML_LOOP_TIMEOUT) (20200925/psparse-529) [ 154.446860] acpi > > > device:02: Failed to change power state to D0 [ 154.690760] video > > > LNXVIDEO:00: Cannot transition to power state D0 for parent in > > > (unknown) > > > > If I were to guess, I would say that AML tries to access memory that > > is not accessible while suspended, probably PCI config space. > > > > > The IPCS is the last function called from \_SB.PC00.PEG1.PG01._ON > > > which we expect it to prepare everything before bringing back the > > > NVIDIA GPU but it's stuck in the infinite loop as described below. > > > Please refer to > > > https://gist.github.com/mschiu77/fa4f5a97297749d0d66fe60c1d421c44 > > > for the full DSDT.dsl. > > > > The DSDT alone may not be sufficient. > > > > Can you please create a bug entry at bugzilla.kernel.org for this > > issue and attach the full output of acpidump from one of the affected > > machines to it? And please let me know the number of the bug. > > > > Also please attach the output of dmesg including a suspend-resume > > cycle including dock disconnection while suspended and the ACPI > > messages quoted below. > > > > > While (One) > > > { > > > If ((!IBSY || (IERR == One))) > > > { > > > Break > > > } > > > > > > If ((Local0 > TMOV)) > > > { > > > RPKG [Zero] = 0x03 > > > Return (RPKG) /* \IPCS.RPKG */ > > > } > > > > > > Sleep (One) > > > Local0++ > > > } > > > > > > And the upstream PCIe port of NVIDIA seems to become inaccessible > > > due to the messages as follows. > > > [ 292.746508] pcieport 0000:00:01.0: waiting 100 ms for downstream > > > link, after activation [ 292.882296] pci 0000:01:00.0: waiting > > > additional 100 ms to become accessible [ 316.876997] pci > > > 0000:01:00.0: can't change power state from D3cold to D0 (config > > > space inaccessible) > > > > > > Since the IPCS is the Intel Reference Code and we don't really know > > > why the never-end loop happens just because we unplug the dock while > > > the system still stays in s2idle. Can anyone from Intel suggest what > > > happens here? > > > > This list is not the right channel for inquiries related to Intel > > support, we can only help you as Linux kernel developers in this > > venue. > > > > > And one thing also worth mentioning, if we unplug the display cable > > > from the dock before entering the s2idle, NVIDIA GPU can come back > > > w/o problem even if we disconnect the dock before exiting s2idle. > > > Here's the lspci information > > > https://gist.github.com/mschiu77/0bfc439d15d52d20de0129b1b2a86dc4 > > > and the dmesg log with ACPI trace_state enabled and dynamic debug on > > > for drivers/pci/pci.c, drivers/acpi/device_pm.c for the whole s2idle > > > enter/exit with IPCS timeout. > > > > > > Any suggestion would be appreciated. Thanks. > > > > First, please use proper Intel support channels for BIOS-related inquiries. > > > > Second, please open a bug as suggested above and let's use it for > > further communication regarding this issue as far as Linux is > > concerned. > > > > Thanks! > > Thanks for the suggestion. I opened > https://bugzilla.kernel.org/show_bug.cgi?id=212951 and have a new finding in > https://bugzilla.kernel.org/show_bug.cgi?id=212951#c13. It seems that maybe we > could do something in the i915 driver during resume to handle the hpd (because we > unplug the dock/dongle when > suspended) at the very beginning. Since it involves HPD, PMC and the BIOS, I don't > know which way I should go to fix this since Windows won't hit this problem. How about https://gitlab.freedesktop.org/drm/intel/-/wikis/How-to-file-i915-bugs to get also our devs involved better. > > Please let me know if there's any information missing in the bugzilla.kernel ticket. > Any suggestions would be appreciated. Thanks > > Chris > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@xxxxxxxxxxxxxxxxxxxxx > https://lists.freedesktop.org/mailman/listinfo/intel-gfx