On 2022-10-18 11:08, jiadong.zhu@xxxxxxx wrote: > From: "Jiadong.Zhu" <Jiadong.Zhu@xxxxxxx> > > The software ring is created to support priority context while there is only > one hardware queue for gfx. > > Every software ring has its fence driver and could be used as an ordinary ring > for the GPU scheduler. > Multiple software rings are bound to a real ring with the ring muxer. The > packages committed on the software ring are copied to the real ring. > > v2: Use array to store software ring entry. > v3: Remove unnecessary prints. > v4: Remove amdgpu_ring_sw_init/fini functions, > using gtt for sw ring buffer for later dma copy > optimization. > v5: Allocate ring entry dynamically in the muxer. > v6: Update comments for the ring muxer. > v7: Modify for function naming. > v8: Combine software ring functions into amdgpu_ring_mux.c I tested patches 1-4 of this series (since Christian clearly nacked patch 5). I hit the oops below. amdgpu_sw_ring_ib_begin+0x70/0x80 is in amdgpu_mcbp_trigger_preempt according to scripts/faddr2line, specifically line 376: spin_unlock(&mux->lock); though I'm not sure why that would crash. Are you not able to reproduce this with a GNOME Wayland session? BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0010) - not-present page PGD 0 P4D 0 Oops: 0010 [#1] PREEMPT SMP NOPTI CPU: 7 PID: 281 Comm: gfx_high Tainted: G E 6.0.2+ #1 Hardware name: LENOVO 20NF0000GE/20NF0000GE, BIOS R11ET36W (1.16 ) 03/30/2020 RIP: 0010:0x0 Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6. RSP: 0018:ffffbd594073bdc8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff993c4a620000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff993c4a62a350 RBP: ffff993c4a62d530 R08: 0000000000000000 R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000114 R13: ffff993c4a620000 R14: 0000000000000000 R15: ffff993c4a62d128 FS: 0000000000000000(0000) GS:ffff993ef0bc0000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: ffffffffffffffd6 CR3: 00000001959fc000 CR4: 00000000003506e0 Call Trace: <TASK> amdgpu_sw_ring_ib_begin+0x70/0x80 [amdgpu] amdgpu_ib_schedule+0x15f/0x5d0 [amdgpu] amdgpu_job_run+0x102/0x1c0 [amdgpu] drm_sched_main+0x19a/0x440 [gpu_sched] ? dequeue_task_stop+0x70/0x70 ? drm_sched_resubmit_jobs+0x10/0x10 [gpu_sched] kthread+0xe9/0x110 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x22/0x30 </TASK> [...] note: gfx_high[281] exited with preempt_count 1 [...] [drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=14864, emitted seq=14865 [drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox.dpkg-di pid 3540 thread firefox:cs0 pid 4666 amdgpu 0000:05:00.0: amdgpu: GPU reset begin! -- Earthling Michel Dänzer | https://redhat.com Libre software enthusiast | Mesa and Xwayland developer