RE: 7840U amdgpu MMVM_L2_PROTECTION_FAULT_STATUS

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



[Public]

> -----Original Message-----
> From: Michael Zimmermann <sigmaepsilon92@xxxxxxxxx>
> Sent: Thursday, February 15, 2024 11:00 AM
> To: stable@xxxxxxxxxxxxxxx
> Cc: regressions@xxxxxxxxxxxxxxx; Deucher, Alexander
> <Alexander.Deucher@xxxxxxx>; Koenig, Christian
> <Christian.Koenig@xxxxxxx>; Pan, Xinhui <Xinhui.Pan@xxxxxxx>
> Subject: 7840U amdgpu MMVM_L2_PROTECTION_FAULT_STATUS
>
> I have a Framework 13 with a 7840U and started having massive GPU driver
> issues a few weeks ago (including system freezes).
> Unfortunately the information of when exactly this started to happen is gone,
> but It should be somewhere in between 6.6.0 and 6.7.4.
> I got many different and random dmesg-errors and system behaviors, but I
> currently can only reproduce one, so let's focus on that for now.
>
> First some basic info:
> I'm on Arch Linux using the `linux` kernel package.(currently at 6.7.4).
> I have an external monitor connected via a thinkpad thunderbolt 4 dock.
> I am using amdgpu.sg_display=0 and VRAM sharing is configured to
> UMA_GAME_OPTIMIZED in the firmware settings.
>
> If I start playing a youtube video in firefox with hardware acceleration enabled,
> it stutters until it stops playing after a few seconds. I can see this in the kernel
> log. I see this multiple times for many different addresses.
> [ 5641.070540] amdgpu 0000:c1:00.0: amdgpu: [mmhub] page fault
> (src_id:0 ring:40 vmid:1 pasid:32786, for process RDD Process pid 3680
> thread firefox-bi:cs0 pid 3852)
> [ 5641.070549] amdgpu 0000:c1:00.0: amdgpu:   in page starting at
> address 0x0000000000020000 from client 18 [ 5641.070553] amdgpu
> 0000:c1:00.0: amdgpu:
> MMVM_L2_PROTECTION_FAULT_STATUS:0x00143A51
> [ 5641.070556] amdgpu 0000:c1:00.0: amdgpu:      Faulty UTCL2 client
> ID: unknown (0x1d)
> [ 5641.070559] amdgpu 0000:c1:00.0: amdgpu:      MORE_FAULTS: 0x1
> [ 5641.070561] amdgpu 0000:c1:00.0: amdgpu:      WALKER_ERROR: 0x0
> [ 5641.070563] amdgpu 0000:c1:00.0: amdgpu:      PERMISSION_FAULTS:
> 0x5
> [ 5641.070565] amdgpu 0000:c1:00.0: amdgpu:      MAPPING_ERROR: 0x0
> [ 5641.070567] amdgpu 0000:c1:00.0: amdgpu:      RW: 0x1

This is a GPU page fault.  E.g., the GPU accessed something that was not mapped into it's virtual address space.  In this case it's GPU work from firefox.  Did you update mesa?  Most often that is the cause of GPU page faults; e.g., a bug in the user mode driver which causes the GPU to read past the end of a buffer or something like that.  If you could narrow down what components you changed (kernel, mesa, firmware) and which was causes the issue that would be helpful.  If it's only the kernel that has changed can you bisect?

Thanks,

Alex





[Index of Archives]     [Linux Kernel]     [Kernel Development Newbies]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite Hiking]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux