On Sun, Jan 19, 2025 at 5:53 PM Pavel Nikulin <pavel@xxxxxxxxxxxx> wrote: > > On Fri, Jan 17, 2025 at 6:08 PM Alex Deucher <alexdeucher@xxxxxxxxx> wrote: > > > > On Fri, Jan 17, 2025 at 7:27 AM Pavel Nikulin <pavel@xxxxxxxxxxxx> wrote: > > > > > > I think it persists as of 6.12.9 and today's firmware version from git. > > > > > > Hardware Asus um560.6 > > > > > > It only happens when the AC adaptor is disconnected, and the screen > > > refresh frequency is set to 120hz. It does not happen on any other > > > refresh frequency, or when the charger is connected. > > > > > > It might be happening in Windows, but at much lower rate, like once in > > > a month. The windows version might be applying some mitigations. > > > > > > Trying to catch what may be a prelude to hang never worked. It's just > > > instahang, without panic, or anything. I cannot debug it without > > > JTAGing the CPU, for which I have no equipment, nor am I sure if there > > > are even JTAG headers exposed on the laptop motherboard. > > > > Please file a bug report and attach your dmesg output. > > https://gitlab.freedesktop.org/drm/amd/-/issues > > > > Alex > > Unfortunately, what I would have would be the same dmesg as anyone > else, however I have made following observations: > > Disabling PSR with debug mask makes it stable. > > If I set the refresh frequency to 60Hz, the lpddr memory clocks wiggle > around 600mHz, and keep going back and forth (spread spectrum > working.) > > If I switch to any other frequency, they stay stably at 937mhz (spread > spectrum stops working,) and hangs happen. > > If I disconnect antennas from the MT7925 WiFi module, the issues are > gone (as well as the wifi connectivity.) > > If I RFKILL the mt7925, both wifi, and bluetooth, it may still hang. > > If I nevertheless try to connect by putting the open laptop right next > to the access point, the laptop will hang. > > But if I only try to do the same with 2.4GHz bluetooth mouse, it will > continue to work. If I connect to 2.4GHz wifi, it will still hang > after a few minutes. > > If I use the RTL8156BG based type-C usb dongle, and disconnect the > power. It works stable. If I keep the connection going on type-C > dongle, but switch on wifi, and set it as a default route, everything > works stable, regardless if I connect to 5GHz or 2.4GHz wifi. > > If I try to put grounding tape around DP cables, and around the wifi > module, it did not do anything conclusively. > > If I try to manually set the GPU performance to high, it marginally > improves the hanging rate. > > DP 2.0, and 2.1 works on 600MHz, 1.4 on 300MHz, 1.2 on 150MHz > depending on link speed, which I can't measure > > So, here is what think may have happened during the transition from 6.11 to 6.12 > > - Something PCIE related (ASPM, other PCIE frequency/power settings) > - Something PSR related (PSR raises memory clock rate, disables spread spectrum) > - Something power related (undervoltage happens when type-C port, or > power is not plugged in) > - Something RF related (rendered less likely by it keeping working > with type-C ethernet dongle plugged in, but not active) > > My guess it's an interplay in between PCIE, and PSR setting. Less > likely, a hardware problem. > > I do remember, someone with a similar bug did dissect the breakage to > a PCIE related commit. > > Do you want me to still put all of the above into a bug ticket on gitlab? What is stabilising the system: Following kernel command line parameters: pcie_aspm=off amdgpu_debugmask=0x200 amdgpu_debugmask=0x10