[Bug 111231] VM_L2_PROTECTION_FAULT

bugzilla-daemon@xxxxxxxxxxxxxxx · Sat, 27 Jul 2019 13:12:20 +0000

          Bug ID
          111231

          Summary
          VM_L2_PROTECTION_FAULT

          Product
          DRI

          Version
          XOrg git

          Hardware
          x86-64 (AMD64)

          OS
          Linux (All)

          Status
          NEW

          Severity
          major

          Priority
          medium

          Component
          DRM/AMDgpu

          Assignee
          dri-devel@lists.freedesktop.org

          Reporter
          ds2.bugs.freedesktop@gmail.com

        When playing minetest on an AMD ryzen 2200G with vega integrated graphics,
occasionally the system will appear to suffer a graphics lock-up during game
load when the loading bar appears.
When this occours, dmesg spits out a VM_L2_PROTECTION_FAULT and then repeated
errors about fence timeouts:

[ 5699.136659] amdgpu 0000:0b:00.0: [gfxhub] no-retry page fault (src_id:0
ring:155 vmid:5 pasid:32770, for process minetest pid 7127 thread minetest:cs0
pid 7133)
[ 5699.136662] amdgpu 0000:0b:00.0:   in page starting at address
0x000080014034d000 from 27
[ 5699.136664] amdgpu 0000:0b:00.0: VM_L2_PROTECTION_FAULT_STATUS:0x00501136
[ 5704.343299] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[ 5709.259775] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=443165, emitted seq=443167
[ 5709.259860] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process minetest pid 7127 thread minetest:cs0 pid 7133
[ 5709.259862] [drm] GPU recovery disabled.
[ 5709.463238] [drm:amdgpu_dm_atomic_commit_tail [amdgpu]] *ERROR* Waiting for
fences timed out.
[ 5719.286451] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=443165, emitted seq=443167
[ 5719.286537] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process minetest pid 7127 thread minetest:cs0 pid 7133
[ 5719.286539] [drm] GPU recovery disabled.
[ 5729.312836] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=443165, emitted seq=443167
[ 5729.312921] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process minetest pid 7127 thread minetest:cs0 pid 7133
[ 5729.312923] [drm] GPU recovery disabled.
[ 5739.339485] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=443165, emitted seq=443167
[ 5739.339570] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process minetest pid 7127 thread minetest:cs0 pid 7133
[ 5739.339572] [drm] GPU recovery disabled.
[ 5749.366552] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx timeout,
signaled seq=443165, emitted seq=443167
[ 5749.366637] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information:
process minetest pid 7127 thread minetest:cs0 pid 7133
[ 5749.366640] [drm] GPU recovery disabled.

Notably, when playing minetest normally, this doesn't always happen, but when
it does the screen gets a light covering of graphical corruption "confetti"
(photos to follow - had to be taken on a phone, sorry).
Currently running a mesa debug build compiled from git at commit b0626c1f306
after seeing if https://bugs.freedesktop.org/show_bug.cgi?id=105251 had
anything to do with it - I think this is related but not entirely a duplicate,
as a fix mentioned there did stop the test program there from having an effect
but did not stop this problem.

In the course of trying to reproduce this problem in a more repeatable manner,
I decided to take an apitrace (will attach in following messages).
Interestingly, the brief trace I took did not crash my system during recording
of it, but now replaying it will fairly regularly cause the same kind of
lockup, more frequently than the game itself will.
I ran apitrace replay in verbose mode to see whereabouts it stopped to see if
this gave an approximate indications of where things starting going pear
shaped.  The point at which output ends is well short of the entire apitrace
dump, as expected from what I saw - and additionally the stderr appears to
contain an exception of some kind. See the apitrace.out.txt and
apitrace.err.txt attachments (to follow separately).

I haven't yet got a dmesg output during minetest running itself, but I have got
some runs (spanning from boot to either hard or soft reboot - sometimes xorg
was killable, othertimes not) from replaying the offending api trace. These
will also be attached in follow-up messages.
These appear to have a lot more GPU faults before the messages about timeouts
appear.

      You are receiving this mail because:

          You are the assignee for the bug.

_______________________________________________
dri-devel mailing list
dri-devel@xxxxxxxxxxxxxxxxxxxxx
https://lists.freedesktop.org/mailman/listinfo/dri-devel

Bug ID	111231
Summary	VM_L2_PROTECTION_FAULT
Product	DRI
Version	XOrg git
Hardware	x86-64 (AMD64)
OS	Linux (All)
Status	NEW
Severity	major
Priority	medium
Component	DRM/AMDgpu
Assignee	dri-devel@lists.freedesktop.org
Reporter	ds2.bugs.freedesktop@gmail.com