Re: drm/amdgpu: AMDGPU unusable since 6.12.1 and it looks like no one cares.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 19, 2025 at 5:53 PM Pavel Nikulin <pavel@xxxxxxxxxxxx> wrote:
>
> On Fri, Jan 17, 2025 at 6:08 PM Alex Deucher <alexdeucher@xxxxxxxxx> wrote:
> >
> > On Fri, Jan 17, 2025 at 7:27 AM Pavel Nikulin <pavel@xxxxxxxxxxxx> wrote:
> > >
> > > I think it persists as of 6.12.9 and today's firmware version from git.
> > >
> > > Hardware Asus um560.6
> > >
> > > It only happens when the AC adaptor is disconnected, and the screen
> > > refresh frequency is set to 120hz. It does not happen on any other
> > > refresh frequency, or when the charger is connected.
> > >
> > > It might be happening in Windows, but at much lower rate, like once in
> > > a month. The windows version might be applying some mitigations.
> > >
> > > Trying to catch what may be a prelude to hang never worked. It's just
> > > instahang, without panic, or anything. I cannot debug it without
> > > JTAGing the CPU, for which I have no equipment, nor am I sure if there
> > > are even JTAG headers exposed on the laptop motherboard.
> >
> > Please file a bug report and attach your dmesg output.
> > https://gitlab.freedesktop.org/drm/amd/-/issues
> >
> > Alex
>
> Unfortunately, what I would have would be the same dmesg as anyone
> else, however I have made following observations:
>
> Disabling PSR with debug mask makes it stable.
>
> If I set the refresh frequency to 60Hz, the lpddr memory clocks wiggle
> around 600mHz, and keep going back and forth (spread spectrum
> working.)
>
> If I switch to any other frequency, they stay stably at 937mhz (spread
> spectrum stops working,) and hangs happen.
>
> If I disconnect antennas from the MT7925 WiFi module, the issues are
> gone (as well as the wifi connectivity.)
>
> If I RFKILL the mt7925, both wifi, and bluetooth, it may still hang.
>
> If I nevertheless try to connect by putting the open laptop right next
> to the access point, the laptop will hang.
>
> But if I only try to do the same with 2.4GHz bluetooth mouse, it will
> continue to work. If I connect to 2.4GHz wifi, it will still hang
> after a few minutes.
>
> If I use the RTL8156BG based type-C usb dongle, and disconnect the
> power. It works stable. If I keep the connection going on type-C
> dongle, but switch on wifi, and set it as a default route, everything
> works stable, regardless if I connect to 5GHz or 2.4GHz wifi.
>
> If I try to put grounding tape around DP cables, and around the wifi
> module, it did not do anything conclusively.
>
> If I try to manually set the GPU performance to high, it marginally
> improves the hanging rate.
>
> DP 2.0, and 2.1 works on 600MHz, 1.4 on 300MHz, 1.2 on 150MHz
> depending on link speed, which I can't measure
>
> So, here is what think may have happened during the transition from 6.11 to 6.12
>
> - Something PCIE related (ASPM, other PCIE frequency/power settings)
> - Something PSR related (PSR raises memory clock rate, disables spread spectrum)
> - Something power related (undervoltage happens when type-C port, or
> power is not plugged in)
> - Something RF related (rendered less likely by it keeping working
> with type-C ethernet dongle plugged in, but not active)
>
> My guess it's an interplay in between PCIE, and PSR setting. Less
> likely, a hardware problem.
>
> I do remember, someone with a similar bug did dissect the breakage to
> a PCIE related commit.
>
> Do you want me to still put all of the above into a bug ticket on gitlab?

What is stabilising the system:

Following kernel command line parameters:
pcie_aspm=off
amdgpu_debugmask=0x200
amdgpu_debugmask=0x10




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux