Re: [PATCH] drm/amdkfd: add schedule to remove RCU stall on CPU

Felix Kuehling <felix.kuehling@xxxxxxx> · Fri, 11 Aug 2023 17:13:50 -0400

I don't understand why this loop is causing a stall. These stall 
warnings indicate that there is an RCU grace period that's not making 
progress. That means there must be an RCU read critical section that's 
being blocked. But there is no RCU-read critical section in 
svm_range_set_attr function. You mentioned the mmap-read-lock. But why 
is that causing an issue? Does it trigger any of the conditions listed 
in kernel/Documentation/RCU/stallwarn.rst?
-       A CPU looping in an RCU read-side critical section.
-       A CPU looping with interrupts disabled.
-       A CPU looping with preemption disabled.
-       A CPU looping with bottom halves disabled.

Or is there another thread that has an mmap_write_lock inside an RCU 
read critical section that's getting stalled by the mmap_read_lock?
Regards,
  Felix


On 2023-08-11 16:50, James Zhu wrote:
On 2023-08-11 16:06, Felix Kuehling wrote:
On 2023-08-11 15:11, James Zhu wrote:
update_list could be big in list_for_each_entry(prange, 
&update_list, update_list),
mmap_read_lock(mm) is kept hold all the time, adding schedule() can 
remove
RCU stall on CPU for this case.

RIP: 0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu]
You're just showing the backtrace here, but not what the problem is. 
Can you include more context, e.g. the message that says something 
about a stall?
[JZ] I attached more log here, and update in patch later.

2023-07-20T14:15:39-04:00 frontier06693 kernel: rcu: INFO: rcu_sched 
self-detected stall on CPU
2023-07-20T14:15:39-04:00 frontier06693 kernel: rcu: #01134-....: 
(59947 ticks this GP) idle=7f6/1/0x4000000000000000 softirq=1735/1735 
fqs=29977
2023-07-20T14:15:39-04:00 frontier06693 kernel: #011(t=60006 jiffies 
g=3265905 q=15150)
2023-07-20T14:15:39-04:00 frontier06693 kernel: rcu: CPU 34: RCU dump 
cpu stacks:
2023-07-20T14:15:39-04:00 frontier06693 kernel: NMI backtrace for cpu 34
2023-07-20T14:15:39-04:00 frontier06693 kernel: CPU: 34 PID: 72044 
Comm: ncsd-it-hip.exe Kdump: loaded Tainted: G           OE 
5.14.21-150400.24.46_12.0.83-cray_shasta_c #1 SLE15-SP4 (unreleased)
2023-07-20T14:15:39-04:00 frontier06693 kernel: Hardware name: HPE 
HPE_CRAY_EX235A/HPE CRAY EX235A, BIOS 1.6.2 03-22-2023
2023-07-20T14:15:39-04:00 frontier06693 kernel: Call Trace:
2023-07-20T14:15:39-04:00 frontier06693 kernel: <IRQ>
2023-07-20T14:15:39-04:00 frontier06693 kernel: dump_stack_lvl+0x44/0x5b
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
nmi_cpu_backtrace+0xdd/0xe0
2023-07-20T14:15:39-04:00 frontier06693 kernel: ? 
lapic_can_unplug_cpu+0xa0/0xa0
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
nmi_trigger_cpumask_backtrace+0xfd/0x130
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
rcu_dump_cpu_stacks+0x13b/0x180
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
rcu_sched_clock_irq+0x6cb/0x930
2023-07-20T14:15:39-04:00 frontier06693 kernel: ? 
trigger_load_balance+0x158/0x390
2023-07-20T14:15:39-04:00 frontier06693 kernel: ? 
scheduler_tick+0xe1/0x290
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
update_process_times+0x8c/0xb0
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
tick_sched_handle.isra.21+0x1d/0x60
2023-07-20T14:15:39-04:00 frontier06693 kernel: ? 
tick_sched_handle.isra.21+0x60/0x60
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
tick_sched_timer+0x67/0x80
2023-07-20T14:15:39-04:00 frontier06693 kernel: ? 
tick_sched_handle.isra.21+0x60/0x60
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
__hrtimer_run_queues+0xa0/0x2b0
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
hrtimer_interrupt+0xe5/0x250
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
__sysvec_apic_timer_interrupt+0x62/0x100
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
sysvec_apic_timer_interrupt+0x4b/0x90
2023-07-20T14:15:39-04:00 frontier06693 kernel: </IRQ>
2023-07-20T14:15:39-04:00 frontier06693 kernel: <TASK>
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
asm_sysvec_apic_timer_interrupt+0x12/0x20
2023-07-20T14:15:39-04:00 frontier06693 kernel: RIP: 
0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel: Code: 00 00 00 bf 00 
02 00 00 48 81 c2 90 00 00 00 e8 1f 6a b9 e0 65 48 8b 14 25 00 bd 01 
00 8b 42 2c 48 8b 3c 24 80 e4 f7 0b 43 d8 <89> 42 2c e8 51 dd 2d e1 48 
8b 7b 38 e8 98 29 b7 e0 48 83 c4 30 b8
2023-07-20T14:15:39-04:00 frontier06693 kernel: RSP: 
0018:ffffc9000ffd7b10 EFLAGS: 00000206
2023-07-20T14:15:39-04:00 frontier06693 kernel: RAX: 0000000000000100 
RBX: ffff88c493968d80 RCX: ffff88d1a6469b18
2023-07-20T14:15:39-04:00 frontier06693 kernel: RDX: ffff88e18ef1ec80 
RSI: ffffc9000ffd7be0 RDI: ffff88c493968d38
2023-07-20T14:15:39-04:00 frontier06693 kernel: RBP: 000000000003062e 
R08: 000000003042f000 R09: 000000003062efff
2023-07-20T14:15:39-04:00 frontier06693 kernel: R10: 0000000000001000 
R11: ffff88c1ad255000 R12: 000000000003042f
2023-07-20T14:15:39-04:00 frontier06693 kernel: R13: ffff88c493968c00 
R14: ffffc9000ffd7be0 R15: ffff88c493968c00
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
__mmu_notifier_invalidate_range_start+0x132/0x1d0
2023-07-20T14:15:39-04:00 frontier06693 kernel: ? 
amdgpu_vm_bo_update+0x3fd/0x520 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
migrate_vma_setup+0x6c7/0x8f0
2023-07-20T14:15:39-04:00 frontier06693 kernel: ? 
kfd_smi_event_migration_start+0x5f/0x80 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
svm_migrate_ram_to_vram+0x14e/0x580 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
svm_range_set_attr+0xe34/0x11a0 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel: kfd_ioctl+0x271/0x4e0 
[amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel: ? 
kfd_ioctl_set_xnack_mode+0xd0/0xd0 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel: __x64_sys_ioctl+0x92/0xd0
2023-07-20T14:15:39-04:00 frontier06693 kernel: ? 
trace_hardirqs_on+0x2a/0xc0
2023-07-20T14:15:39-04:00 frontier06693 kernel: do_syscall_64+0x42/0xc0
2023-07-20T14:15:39-04:00 frontier06693 kernel: 
entry_SYSCALL_64_after_hwframe+0x61/0xcb

Code: 00 00 00 bf 00 02 00 00 48 81 c2 90 00 00 00 e8 1f 6a b9 e0 65 
48 8b 14 25 00 bd 01 00 8b 42 2c 48 8b 3c 24 80 e4 f7 0b 43 d8 <89> 
42 2c e8 51 dd 2d e1 48 8b 7b 38 e8 98 29 b7 e0 48 83 c4 30 b8
RSP: 0018:ffffc9000ffd7b10 EFLAGS: 00000206
RAX: 0000000000000100 RBX: ffff88c493968d80 RCX: ffff88d1a6469b18
RDX: ffff88e18ef1ec80 RSI: ffffc9000ffd7be0 RDI: ffff88c493968d38
RBP: 000000000003062e R08: 000000003042f000 R09: 000000003062efff
R10: 0000000000001000 R11: ffff88c1ad255000 R12: 000000000003042f
R13: ffff88c493968c00 R14: ffffc9000ffd7be0 R15: ffff88c493968c00
__mmu_notifier_invalidate_range_start+0x132/0x1d0
? amdgpu_vm_bo_update+0x3fd/0x520 [amdgpu]
migrate_vma_setup+0x6c7/0x8f0
? kfd_smi_event_migration_start+0x5f/0x80 [amdgpu]
svm_migrate_ram_to_vram+0x14e/0x580 [amdgpu]
svm_range_set_attr+0xe34/0x11a0 [amdgpu]
kfd_ioctl+0x271/0x4e0 [amdgpu]
? kfd_ioctl_set_xnack_mode+0xd0/0xd0 [amdgpu]
__x64_sys_ioctl+0x92/0xd0

Signed-off-by: James Zhu <James.Zhu@xxxxxxx>
---
  drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 1 +
  1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c 
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 113fd11aa96e..9f2d48ade7fa 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3573,6 +3573,7 @@ svm_range_set_attr(struct kfd_process *p, 
struct mm_struct *mm,
          r = svm_range_trigger_migration(mm, prange, &migrated);
          if (r)
              goto out_unlock_range;
+        schedule();
I'm not sure that unconditionally scheduling here in every loop 
iteration is a good solution. This could lead to performance 
degradation when there are many small ranges. I think a better option 
is to call cond_resched. That would only reschedule only "if 
necessary", though I haven't quite figured out the criteria for 
rescheduling being necessary.
[JZ] you are right, small range will sacrifice performance.  but 
cond_resched has no guarantee to remove RCU stall CPU completely. 
Maybe we add own condition check here based on accumulated prange 
which ls processed.
Regards,
  Felix


            if (migrated && (!p->xnack_enabled ||
              (prange->flags & 
KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&