Re: [PATCH 1/5] drm/amdgpu: Introduce gfx software ring (v8)

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Am 20.10.22 um 16:49 schrieb Michel Dänzer:
On 2022-10-18 11:08, jiadong.zhu@xxxxxxx wrote:
From: "Jiadong.Zhu" <Jiadong.Zhu@xxxxxxx>

The software ring is created to support priority context while there is only
one hardware queue for gfx.

Every software ring has its fence driver and could be used as an ordinary ring
for the GPU scheduler.
Multiple software rings are bound to a real ring with the ring muxer. The
packages committed on the software ring are copied to the real ring.

v2: Use array to store software ring entry.
v3: Remove unnecessary prints.
v4: Remove amdgpu_ring_sw_init/fini functions,
using gtt for sw ring buffer for later dma copy
optimization.
v5: Allocate ring entry dynamically in the muxer.
v6: Update comments for the ring muxer.
v7: Modify for function naming.
v8: Combine software ring functions into amdgpu_ring_mux.c
I tested patches 1-4 of this series (since Christian clearly nacked patch 5). I hit the oops below.

As long as you don't try to reset the GPU you can also test patch 5. It's just that we can't upstream the stuff like this or that would break immediately.

Regards,
Christian.


amdgpu_sw_ring_ib_begin+0x70/0x80 is in amdgpu_mcbp_trigger_preempt according to scripts/faddr2line, specifically line 376:

	spin_unlock(&mux->lock);

though I'm not sure why that would crash.


Are you not able to reproduce this with a GNOME Wayland session?


BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor instruction fetch in kernel mode
#PF: error_code(0x0010) - not-present page
PGD 0 P4D 0
Oops: 0010 [#1] PREEMPT SMP NOPTI
CPU: 7 PID: 281 Comm: gfx_high Tainted: G            E      6.0.2+ #1
Hardware name: LENOVO 20NF0000GE/20NF0000GE, BIOS R11ET36W (1.16 ) 03/30/2020
RIP: 0010:0x0
Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
RSP: 0018:ffffbd594073bdc8 EFLAGS: 00010246
RAX: 0000000000000000 RBX: ffff993c4a620000 RCX: 0000000000000000
RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff993c4a62a350
RBP: ffff993c4a62d530 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000114
R13: ffff993c4a620000 R14: 0000000000000000 R15: ffff993c4a62d128
FS:  0000000000000000(0000) GS:ffff993ef0bc0000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffffffffd6 CR3: 00000001959fc000 CR4: 00000000003506e0
Call Trace:
  <TASK>
  amdgpu_sw_ring_ib_begin+0x70/0x80 [amdgpu]
  amdgpu_ib_schedule+0x15f/0x5d0 [amdgpu]
  amdgpu_job_run+0x102/0x1c0 [amdgpu]
  drm_sched_main+0x19a/0x440 [gpu_sched]
  ? dequeue_task_stop+0x70/0x70
  ? drm_sched_resubmit_jobs+0x10/0x10 [gpu_sched]
  kthread+0xe9/0x110
  ? kthread_complete_and_exit+0x20/0x20
  ret_from_fork+0x22/0x30
  </TASK>
[...]
note: gfx_high[281] exited with preempt_count 1
[...]
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* ring gfx_low timeout, signaled seq=14864, emitted seq=14865
[drm:amdgpu_job_timedout [amdgpu]] *ERROR* Process information: process firefox.dpkg-di pid 3540 thread firefox:cs0 pid 4666
amdgpu 0000:05:00.0: amdgpu: GPU reset begin!






[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux