[Bug 205089] amdgpu : drm:amdgpu_cs_ioctl : Failed to initialize parser -125

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



https://bugzilla.kernel.org/show_bug.cgi?id=205089

Manuel Jesús de la Fuente (m@xxxxxxxxx) changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |m@xxxxxxxxx

--- Comment #40 from Manuel Jesús de la Fuente (m@xxxxxxxxx) ---
Can still reproduce using the following:

- Ryzen 9 5900XT
- Radeon RX 6700XT

- Linux 5.17.4-1-default (openSUSE Tumbleweed with KDE Plasma)
- Mesa 22.0.2-308.2

May 08 20:18:32 localhost.localdomain kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* ring gfx_0.0.0 timeout, signaled seq=2371535, emitted
seq=2371537
May 08 20:18:32 localhost.localdomain kernel: [drm:amdgpu_job_timedout
[amdgpu]] *ERROR* Process information: process kwin_x11 pid 1795 thread
kwin_x11:cs0 pid 1801
May 08 20:18:32 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: GPU
reset begin!
May 08 20:18:33 localhost.localdomain kernel: amdgpu 0000:2d:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed
(-110)
May 08 20:18:33 localhost.localdomain kernel: [drm:gfx_v10_0_hw_fini [amdgpu]]
*ERROR* KGQ disable failed
May 08 20:18:33 localhost.localdomain kernel: amdgpu 0000:2d:00.0:
[drm:amdgpu_ring_test_helper [amdgpu]] *ERROR* ring kiq_2.1.0 test failed
(-110)
May 08 20:18:33 localhost.localdomain kernel: [drm:gfx_v10_0_hw_fini [amdgpu]]
*ERROR* KCQ disable failed
May 08 20:18:33 localhost.localdomain kernel: [drm:gfx_v10_0_hw_fini [amdgpu]]
*ERROR* failed to halt cp gfx
May 08 20:18:33 localhost.localdomain kernel: [drm] free PSP TMR buffer
May 08 20:18:33 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu:
MODE1 reset
May 08 20:18:33 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: GPU
mode1 reset
May 08 20:18:33 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: GPU
smu mode1 reset
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: GPU
reset succeeded, trying to resume
May 08 20:18:34 localhost.localdomain kernel: [drm] PCIE GART of 512M enabled
(table at 0x0000008000300000).
May 08 20:18:34 localhost.localdomain kernel: [drm] VRAM is lost due to GPU
reset!
May 08 20:18:34 localhost.localdomain kernel: [drm] PSP is resuming...
May 08 20:18:34 localhost.localdomain kernel: [drm] reserve 0xa00000 from
0x82fe000000 for PSP TMR
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: RAS:
optional ras ta ucode is not available
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu:
SECUREDISPLAY: securedisplay ta ucode is not available
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: SMU
is resuming...
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: smu
driver if version = 0x0000000e, smu fw if version = 0x00000012, smu fw version
= 0x00413500 (65.53.0)
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: SMU
driver if version not matched
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: SMU
is resumed successfully!
May 08 20:18:34 localhost.localdomain kernel: [drm] DMUB hardware initialized:
version=0x0202000C
May 08 20:18:34 localhost.localdomain kernel: [drm] kiq ring mec 2 pipe 1 q 0
May 08 20:18:34 localhost.localdomain kernel: [drm] VCN decode and encode
initialized successfully(under DPG Mode).
May 08 20:18:34 localhost.localdomain kernel: [drm] JPEG decode initialized
successfully.
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
gfx_0.0.0 uses VM inv eng 0 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
comp_1.0.0 uses VM inv eng 1 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
comp_1.1.0 uses VM inv eng 4 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
comp_1.2.0 uses VM inv eng 5 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
comp_1.3.0 uses VM inv eng 6 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
comp_1.0.1 uses VM inv eng 7 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
comp_1.1.1 uses VM inv eng 8 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
comp_1.2.1 uses VM inv eng 9 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
comp_1.3.1 uses VM inv eng 10 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
kiq_2.1.0 uses VM inv eng 11 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
sdma0 uses VM inv eng 12 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
sdma1 uses VM inv eng 13 on hub 0
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
vcn_dec_0 uses VM inv eng 0 on hub 1
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
vcn_enc_0.0 uses VM inv eng 1 on hub 1
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
vcn_enc_0.1 uses VM inv eng 4 on hub 1
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: ring
jpeg_dec uses VM inv eng 5 on hub 1
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu:
recover vram bo from shadow start
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu:
recover vram bo from shadow done
May 08 20:18:34 localhost.localdomain kernel: [drm] Skip scheduling IBs!
May 08 20:18:34 localhost.localdomain kernel: [drm] Skip scheduling IBs!
May 08 20:18:34 localhost.localdomain kernel: amdgpu 0000:2d:00.0: amdgpu: GPU
reset(2) succeeded!
May 08 20:18:34 localhost.localdomain kernel: [drm] Skip scheduling IBs!

[ ... the previous line, but loads of times ]

May 08 20:18:34 localhost.localdomain kernel: [drm] Skip scheduling IBs!
May 08 20:18:34 localhost.localdomain kernel: amdgpu_cs_ioctl: 46 callbacks
suppressed
May 08 20:18:34 localhost.localdomain kernel: [drm:amdgpu_cs_ioctl [amdgpu]]
*ERROR* Failed to initialize parser -125!

[ ... the previous line, but loads of times. These are the '-125!' ones ]

May 08 20:18:44 localhost.localdomain kernel: [drm:amdgpu_cs_ioctl [amdgpu]]
*ERROR* Failed to initialize parser -125!
May 08 20:18:44 localhost.localdomain xembedsniproxy[1862]: Container window
visible, stack below
May 08 20:18:44 localhost.localdomain kernel: [drm:amdgpu_cs_ioctl [amdgpu]]
*ERROR* Failed to initialize parser -125!


One interesting detail/partial workaround is that underclocking the RAM speed
helps reduce it. Setting it to 2400 especifically (native speed of the 32GB of
ram is 3600) makes it happen much less often (still does happen though).

Another thing is that it might be somehow related to the GPU's built in audio
conflicting with intel's snd_hda_intel, which is part of a few other's logs
(sometimes appearing for me too). Audio is also choppy until a Pulse restart
with pulseaudio -k, which might be the cause for this first freeze with RAM at
2400. This may be unrelated though, and is just conjecture from my part.

Happy to help debug the issue if anyone can guide me through the process a bit.
Will also take a look at reporting this to the Mesa side too.

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.



[Index of Archives]     [Linux DRI Users]     [Linux Intel Graphics]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [XFree86]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux