I apologize if the name callout is disconcerting. I was trying to follow instructions for sending bugs and saw your name listed as the owner of this code area. FYI, I'd done some more troubleshooting and tinkering regarding the crashing and Mate seems to be at the center of all the issues. As a result, I also opened an Issue with the Mate Desktop team (https://github.com/mate-desktop/mate-panel/issues/1242). Mate also has a power management component, which is probably responsible for the excess logging and the confusion over Navil10. However, I have no way to vouch for now accurately the Mate PM applet gathered data for its instantiation. I have no external devices connected that I'm aware would use it since I thought that was via HDMI. I *do* have a Jabra Evolve2 headset that uses the TypeC USB connector, but I assume that's not using the GPU. The issue documentation I left with Mate notes that if I launch apps from a terminal that is NOT launched from the Mate panel (right-click on desktop instead to open terminal), the parent for all the apps (Firefox, Evolution, etc.) is separate from Mate (at least separate from mate-panel). Everything has worked fine (except for the constant logging of the wake-up action) since I've done that (and turned off the screensaver and screensaver lock). So, I'm not sure what else to do at this point. Please advise if I should do anything on the driver side. Thanks, Tim On Thu, 2021-07-29 at 11:14 -0400, Felix Kuehling wrote: Am 2021-07-28 um 12:10 p.m. schrieb Tim Cahill:Hi Felix,I'm not sure why you're calling me out by name. I'm not working onanything obviously related to your crashes.Anyway, I took a quick look at the backtraces. They all point at libgdk.Two of them are segfaults, one is an abort. It's not clear how thesewould be related to the GPU driver. That said, when you boot withnomodeset, the GPU driver and all HW acceleration is completelydisabled. If that makes the problem disappear, the GPU driver is clearlyinvolved in the problem in some way.The abort points at a problem while freeing memory. This could be causedby a double-free problem in some unrelated code, possibly related to theGPU driver. This would be a problem in a user mode component (maybeMesa), not the kernel mode driver.I believe the messages you're seeing when you move the mouse are theresult of runtime power management that puts the GPU to sleep when it'sidle and reinitializes it when it's needed. You have 2 GPUs in yourlaptop, an integrated Renoir GPU in the Ryzen CPU, and an externalNavi10 GPU for higher gaming performance. The GPU that goes to sleep andwakes up is the external Navi10 GPU.The OpenGL renderer string specifies "RENOIR". Therefore I'm surprisedthat the Navi10 GPU wakes up when you move the mouse. Ideally itshouldn't be used at all when you're just using the desktop.If you suspect that runtime power management is responsible for yourproblems, you could disable it with amdgpu.runpm=0 on the kernel commandline. That means the Navi10 GPU won't go into the low power mode anddrain your battery more quickly. So this is not a permanent solution.Just an experiment to narrow down the problem.Regards,FelixI'm not sure how to do this as I haven't had to report a bug before.I've looked to a variety of bug reporting sites to see if anyone elseis running into the same issues that I'm having (such as the Mateproject) and haven't seen anything at all similar to the issue I'mhaving. Since I had issues with AMD drivers with my distro (infobelow) and some consistent and high volume dmesg content shows up,I've decided that I should start here with the AMD kernel team.I have a fairly new MSI laptop with the following configuration:[code]System: Kernel: 5.11.0-25-generic x86_64 bits: 64 compiler: N/ADesktop: MATE 1.24.0 wm: marcodm: LightDM Distro: Linux Mint 20.2 Uma base: Ubuntu 20.04focalMachine: Type: Laptop System: Micro-Star product: Alpha 17 A4DEK v:REV:1.0 serial: <filter>Chassis: type: 10 serial: <filter>Mobo: Micro-Star model: MS-17EK v: REV:1.0 serial: <filter>UEFI: American Megatrendsv: E17EKAMS.101 date: 10/26/2020Battery: ID-1: BAT1 charge: 66.2 Wh condition: 67.0/65.7 Wh (102%)volts: 12.4/10.8model: MSI Corp. MS-17EK serial: N/A status: UnknownCPU: Topology: 8-Core model: AMD Ryzen 7 4800H with RadeonGraphics bits: 64 type: MT MCParch: Zen rev: 1 L2 cache: 4096 KiBflags: avx avx2 lm nx pae sse sse2 sse3 sse4_1 sse4_2 sse4assse3 svm bogomips: 92630Speed: 4278 MHz min/max: 1400/2900 MHz Core speeds (MHz):1: 4280 2: 1865 3: 13974: 2188 5: 1489 6: 2265 7: 1907 8: 1906 9: 1729 10: 139711: 1397 12: 1397 13: 139714: 1397 15: 1907 16: 1740Graphics: Device-1: AMD Navi 10 [Radeon RX 5600 OEM/5600 XT /5700/5700 XT]vendor: Micro-Star MSI driver: amdgpu v: kernel bus ID:03:00.0 chip ID: 1002:731fDevice-2: AMD Renoir vendor: Micro-Star MSI driver: amdgpuv: kernel bus ID: 08:00.0chip ID: 1002:1636Display: x11 server: X.Org 1.20.9 driver: amdgpu,atiunloaded: fbdev,modesetting,radeon,vesa compositor: marcoresolution: 1920x1080~144HzOpenGL: renderer: AMD RENOIR (DRM 3.40.0 5.11.0-25-genericLLVM 11.0.0)v: 4.6 Mesa 20.2.6 direct render: YesAudio: Device-1: AMD Navi 10 HDMI Audio vendor: Micro-Star MSIdriver: snd_hda_intel v: kernelbus ID: 03:00.1 chip ID: 1002:ab38Device-2: AMD Raven/Raven2/FireFlight/Renoir AudioProcessor vendor: Micro-Star MSIdriver: N/A bus ID: 08:00.5 chip ID: 1022:15e2Device-3: AMD Family 17h HD Audio vendor: Micro-Star MSIdriver: snd_hda_intelv: kernel bus ID: 08:00.6 chip ID: 1022:15e3Sound Server: ALSA v: k5.11.0-25-genericNetwork: Device-1: Intel Wi-Fi 6 AX200 driver: iwlwifi v: kernel busID: 04:00.0chip ID: 8086:2723IF: wlp4s0 state: up mac: <filter>Device-2: Realtek RTL8111/8168/8411 PCI Express GigabitEthernet vendor: Micro-Star MSIdriver: r8169 v: kernel port: f000 bus ID: 05:00.0 chip ID:10ec:8168IF: eno1 state: down mac: <filter>Drives: Local Storage: total: 476.94 GiB used: 89.79 GiB (18.8%)ID-1: /dev/nvme0n1 vendor: Kingston model: OM8PCP3512F-AI1size: 476.94 GiBspeed: 31.6 Gb/s lanes: 4 serial: <filter>Partition: ID-1: / size: 466.30 GiB used: 89.28 GiB (19.1%) fs: ext4dev: /dev/dm-1ID-2: /boot size: 704.5 MiB used: 519.7 MiB (73.8%) fs:ext4 dev: /dev/nvme0n1p2ID-3: swap-1 size: 980.0 MiB used: 0 KiB (0.0%) fs: swapdev: /dev/dm-2USB: Hub: 1-0:1 info: Full speed (or root) Hub ports: 4 rev: 2.0chip ID: 1d6b:0002Device-1: 1-3:2 info: SteelSeries ApS SteelSeries KLC type:HIDdriver: hid-generic,usbhid rev: 2.0 chip ID: 1038:1122Device-2: 1-4:3 info: Acer HD Webcam type: Video driver:uvcvideo rev: 2.0chip ID: 5986:211cHub: 2-0:1 info: Full speed (or root) Hub ports: 2 rev: 3.1chip ID: 1d6b:0003Hub: 3-0:1 info: Full speed (or root) Hub ports: 4 rev: 2.0chip ID: 1d6b:0002Device-3: 3-3:2 info: Intel type: Bluetooth driver: btusbrev: 2.0 chip ID: 8087:0029Hub: 4-0:1 info: Full speed (or root) Hub ports: 2 rev: 3.1chip ID: 1d6b:0003Sensors: System Temperatures: cpu: 46.5 C mobo: N/AFan Speeds (RPM): N/AGPU: device: amdgpu temp: 0 C fan: 65535 device: amdgputemp: 31 CRepos: No active apt repos in: /etc/apt/sources.listActive apt repos in:/etc/apt/sources.list.d/official-package-repositories.list1: deb http: //mirrors.seas.harvard.edu/linuxmint-packagesuma main upstream import backport2: deb http: //mirror.us-ny2.kamatera.com/ubuntu focal mainrestricted universe multiverse3: deb http: //mirror.us-ny2.kamatera.com/ubuntufocal-updates main restricted universe multiverse4: deb http: //mirror.us-ny2.kamatera.com/ubuntufocal-backports main restricted universe multiverse5: deb http: //security.ubuntu.com/ubuntu/ focal-securitymain restricted universe multiverse6: deb http: //archive.canonical.com/ubuntu/ focal partnerInfo: Processes: 372 Uptime: 2h 44m Memory: 15.10 GiB used: 1.15GiB (7.6%) Init: systemdv: 245 runlevel: 5 Compilers: gcc: 9.3.0 alt: 9 Client:Unknown python3.8 clientinxi: 3.0.38[/code]If I am using it interactively, I get random crashes that seems to hitelements of mate (mate-panel, etc.)consistently - just not predictably. LibreOffice applications, xed,Firefox, and Evolution seem to be more proneto crashing the X session. I can easily move to tty1, login, and killservices running in tty7 as the crashesdon't appear to completely kill tty7. Sometimes, I can kill mate andlaunch a new instance to salvagethe tty7 session. However, i usually end up having to kill the rootpid of the xwindows session in orderto re-login. But I think this is related to the AMD GPU driver becauseevery time I simply move the mouse intty7 session, I get the following in dmesg:[13164.399550] [drm] PCIE GART of 512M enabled (table at0x0000008000000000).[13164.399579] [drm] PSP is resuming...[13164.486593] [drm] reserve 0xa00000 from 0x800f400000 for PSP TMR[13164.678788] amdgpu 0000:03:00.0: amdgpu: RAS: optional ras ta ucodeis not available[13164.702624] amdgpu 0000:03:00.0: amdgpu: RAP: optional rap ta ucodeis not available[13164.702639] amdgpu 0000:03:00.0: amdgpu: SMU is resuming...[13164.702648] amdgpu 0000:03:00.0: amdgpu: smu driver if version =0x00000036, smu fw if version = 0x00000037, smu fw version =0x002a3f00 (42.63.0)[13164.702664] amdgpu 0000:03:00.0: amdgpu: SMU driver if version notmatched[13164.746143] amdgpu 0000:03:00.0: amdgpu: SMU is resumed successfully![13164.768978] [drm] kiq ring mec 2 pipe 1 q 0[13164.779651] [drm] VCN decode and encode initializedsuccessfully(under DPG Mode).[13164.779758] [drm] JPEG decode initialized successfully.[13164.779779] amdgpu 0000:03:00.0: amdgpu: ring gfx_0.0.0 uses VM inveng 0 on hub 0[13164.779783] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.0 uses VMinv eng 1 on hub 0[13164.779784] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.0 uses VMinv eng 4 on hub 0[13164.779785] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.0 uses VMinv eng 5 on hub 0[13164.779786] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.0 uses VMinv eng 6 on hub 0[13164.779787] amdgpu 0000:03:00.0: amdgpu: ring comp_1.0.1 uses VMinv eng 7 on hub 0[13164.779788] amdgpu 0000:03:00.0: amdgpu: ring comp_1.1.1 uses VMinv eng 8 on hub 0[13164.779789] amdgpu 0000:03:00.0: amdgpu: ring comp_1.2.1 uses VMinv eng 9 on hub 0[13164.779790] amdgpu 0000:03:00.0: amdgpu: ring comp_1.3.1 uses VMinv eng 10 on hub 0[13164.779792] amdgpu 0000:03:00.0: amdgpu: ring kiq_2.1.0 uses VM inveng 11 on hub 0[13164.779793] amdgpu 0000:03:00.0: amdgpu: ring sdma0 uses VM inv eng12 on hub 0[13164.779803] amdgpu 0000:03:00.0: amdgpu: ring sdma1 uses VM inv eng13 on hub 0[13164.779804] amdgpu 0000:03:00.0: amdgpu: ring vcn_dec uses VM inveng 0 on hub 1[13164.779805] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc0 uses VM inveng 1 on hub 1[13164.779806] amdgpu 0000:03:00.0: amdgpu: ring vcn_enc1 uses VM inveng 4 on hub 1[13164.779807] amdgpu 0000:03:00.0: amdgpu: ring jpeg_dec uses VM inveng 5 on hub 1[13164.783807] amdgpu 0000:03:00.0: [drm] Cannot find any crtc or sizes[13170.722306] [drm] free PSP TMR bufferIf I boot with nomodeset, I can operate fine - just without screenbrightness control, etc. It justseems strange that an event is generated like this all the time.I only get sporadic crashes, though. Humorously, I've been runningonly Firefox, crash reporter andMate Terminal this morning and it's run fine for over 4 hours. Therewere times when I wouldn't runanything at all and it's lock up on me. So I just can't find anycommon denominator for this (using viin terminal to type this - going to copy-paste into email client[Evolution] once I'm done this).I've attached 3 crash reports that were captured on the system overthe last couple days. I apologizein advance - profusely! - if the problem turns out to be somewhere else.Thanks,Tim_______________________________________________amd-gfx mailing listamd-gfx@xxxxxxxxxxxxxxxxxxxxxhttps://lists.freedesktop.org/mailman/listinfo/amd-gfx |
_______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx