I don't understand why this loop is causing a stall. These stall
warnings indicate that there is an RCU grace period that's not making
progress. That means there must be an RCU read critical section that's
being blocked. But there is no RCU-read critical section in
svm_range_set_attr function. You mentioned the mmap-read-lock. But why
is that causing an issue? Does it trigger any of the conditions listed
in kernel/Documentation/RCU/stallwarn.rst?
- A CPU looping in an RCU read-side critical section.
- A CPU looping with interrupts disabled.
- A CPU looping with preemption disabled.
- A CPU looping with bottom halves disabled.
Or is there another thread that has an mmap_write_lock inside an RCU
read critical section that's getting stalled by the mmap_read_lock?
Regards,
Felix
On 2023-08-11 16:50, James Zhu wrote:
On 2023-08-11 16:06, Felix Kuehling wrote:
On 2023-08-11 15:11, James Zhu wrote:
update_list could be big in list_for_each_entry(prange,
&update_list, update_list),
mmap_read_lock(mm) is kept hold all the time, adding schedule() can
remove
RCU stall on CPU for this case.
RIP: 0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu]
You're just showing the backtrace here, but not what the problem is.
Can you include more context, e.g. the message that says something
about a stall?
[JZ] I attached more log here, and update in patch later.
2023-07-20T14:15:39-04:00 frontier06693 kernel: rcu: INFO: rcu_sched
self-detected stall on CPU
2023-07-20T14:15:39-04:00 frontier06693 kernel: rcu: #01134-....:
(59947 ticks this GP) idle=7f6/1/0x4000000000000000 softirq=1735/1735
fqs=29977
2023-07-20T14:15:39-04:00 frontier06693 kernel: #011(t=60006 jiffies
g=3265905 q=15150)
2023-07-20T14:15:39-04:00 frontier06693 kernel: rcu: CPU 34: RCU dump
cpu stacks:
2023-07-20T14:15:39-04:00 frontier06693 kernel: NMI backtrace for cpu 34
2023-07-20T14:15:39-04:00 frontier06693 kernel: CPU: 34 PID: 72044
Comm: ncsd-it-hip.exe Kdump: loaded Tainted: G OE
5.14.21-150400.24.46_12.0.83-cray_shasta_c #1 SLE15-SP4 (unreleased)
2023-07-20T14:15:39-04:00 frontier06693 kernel: Hardware name: HPE
HPE_CRAY_EX235A/HPE CRAY EX235A, BIOS 1.6.2 03-22-2023
2023-07-20T14:15:39-04:00 frontier06693 kernel: Call Trace:
2023-07-20T14:15:39-04:00 frontier06693 kernel: <IRQ>
2023-07-20T14:15:39-04:00 frontier06693 kernel: dump_stack_lvl+0x44/0x5b
2023-07-20T14:15:39-04:00 frontier06693 kernel:
nmi_cpu_backtrace+0xdd/0xe0
2023-07-20T14:15:39-04:00 frontier06693 kernel: ?
lapic_can_unplug_cpu+0xa0/0xa0
2023-07-20T14:15:39-04:00 frontier06693 kernel:
nmi_trigger_cpumask_backtrace+0xfd/0x130
2023-07-20T14:15:39-04:00 frontier06693 kernel:
rcu_dump_cpu_stacks+0x13b/0x180
2023-07-20T14:15:39-04:00 frontier06693 kernel:
rcu_sched_clock_irq+0x6cb/0x930
2023-07-20T14:15:39-04:00 frontier06693 kernel: ?
trigger_load_balance+0x158/0x390
2023-07-20T14:15:39-04:00 frontier06693 kernel: ?
scheduler_tick+0xe1/0x290
2023-07-20T14:15:39-04:00 frontier06693 kernel:
update_process_times+0x8c/0xb0
2023-07-20T14:15:39-04:00 frontier06693 kernel:
tick_sched_handle.isra.21+0x1d/0x60
2023-07-20T14:15:39-04:00 frontier06693 kernel: ?
tick_sched_handle.isra.21+0x60/0x60
2023-07-20T14:15:39-04:00 frontier06693 kernel:
tick_sched_timer+0x67/0x80
2023-07-20T14:15:39-04:00 frontier06693 kernel: ?
tick_sched_handle.isra.21+0x60/0x60
2023-07-20T14:15:39-04:00 frontier06693 kernel:
__hrtimer_run_queues+0xa0/0x2b0
2023-07-20T14:15:39-04:00 frontier06693 kernel:
hrtimer_interrupt+0xe5/0x250
2023-07-20T14:15:39-04:00 frontier06693 kernel:
__sysvec_apic_timer_interrupt+0x62/0x100
2023-07-20T14:15:39-04:00 frontier06693 kernel:
sysvec_apic_timer_interrupt+0x4b/0x90
2023-07-20T14:15:39-04:00 frontier06693 kernel: </IRQ>
2023-07-20T14:15:39-04:00 frontier06693 kernel: <TASK>
2023-07-20T14:15:39-04:00 frontier06693 kernel:
asm_sysvec_apic_timer_interrupt+0x12/0x20
2023-07-20T14:15:39-04:00 frontier06693 kernel: RIP:
0010:svm_range_cpu_invalidate_pagetables+0x317/0x610 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel: Code: 00 00 00 bf 00
02 00 00 48 81 c2 90 00 00 00 e8 1f 6a b9 e0 65 48 8b 14 25 00 bd 01
00 8b 42 2c 48 8b 3c 24 80 e4 f7 0b 43 d8 <89> 42 2c e8 51 dd 2d e1 48
8b 7b 38 e8 98 29 b7 e0 48 83 c4 30 b8
2023-07-20T14:15:39-04:00 frontier06693 kernel: RSP:
0018:ffffc9000ffd7b10 EFLAGS: 00000206
2023-07-20T14:15:39-04:00 frontier06693 kernel: RAX: 0000000000000100
RBX: ffff88c493968d80 RCX: ffff88d1a6469b18
2023-07-20T14:15:39-04:00 frontier06693 kernel: RDX: ffff88e18ef1ec80
RSI: ffffc9000ffd7be0 RDI: ffff88c493968d38
2023-07-20T14:15:39-04:00 frontier06693 kernel: RBP: 000000000003062e
R08: 000000003042f000 R09: 000000003062efff
2023-07-20T14:15:39-04:00 frontier06693 kernel: R10: 0000000000001000
R11: ffff88c1ad255000 R12: 000000000003042f
2023-07-20T14:15:39-04:00 frontier06693 kernel: R13: ffff88c493968c00
R14: ffffc9000ffd7be0 R15: ffff88c493968c00
2023-07-20T14:15:39-04:00 frontier06693 kernel:
__mmu_notifier_invalidate_range_start+0x132/0x1d0
2023-07-20T14:15:39-04:00 frontier06693 kernel: ?
amdgpu_vm_bo_update+0x3fd/0x520 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel:
migrate_vma_setup+0x6c7/0x8f0
2023-07-20T14:15:39-04:00 frontier06693 kernel: ?
kfd_smi_event_migration_start+0x5f/0x80 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel:
svm_migrate_ram_to_vram+0x14e/0x580 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel:
svm_range_set_attr+0xe34/0x11a0 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel: kfd_ioctl+0x271/0x4e0
[amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel: ?
kfd_ioctl_set_xnack_mode+0xd0/0xd0 [amdgpu]
2023-07-20T14:15:39-04:00 frontier06693 kernel: __x64_sys_ioctl+0x92/0xd0
2023-07-20T14:15:39-04:00 frontier06693 kernel: ?
trace_hardirqs_on+0x2a/0xc0
2023-07-20T14:15:39-04:00 frontier06693 kernel: do_syscall_64+0x42/0xc0
2023-07-20T14:15:39-04:00 frontier06693 kernel:
entry_SYSCALL_64_after_hwframe+0x61/0xcb
Code: 00 00 00 bf 00 02 00 00 48 81 c2 90 00 00 00 e8 1f 6a b9 e0 65
48 8b 14 25 00 bd 01 00 8b 42 2c 48 8b 3c 24 80 e4 f7 0b 43 d8 <89>
42 2c e8 51 dd 2d e1 48 8b 7b 38 e8 98 29 b7 e0 48 83 c4 30 b8
RSP: 0018:ffffc9000ffd7b10 EFLAGS: 00000206
RAX: 0000000000000100 RBX: ffff88c493968d80 RCX: ffff88d1a6469b18
RDX: ffff88e18ef1ec80 RSI: ffffc9000ffd7be0 RDI: ffff88c493968d38
RBP: 000000000003062e R08: 000000003042f000 R09: 000000003062efff
R10: 0000000000001000 R11: ffff88c1ad255000 R12: 000000000003042f
R13: ffff88c493968c00 R14: ffffc9000ffd7be0 R15: ffff88c493968c00
__mmu_notifier_invalidate_range_start+0x132/0x1d0
? amdgpu_vm_bo_update+0x3fd/0x520 [amdgpu]
migrate_vma_setup+0x6c7/0x8f0
? kfd_smi_event_migration_start+0x5f/0x80 [amdgpu]
svm_migrate_ram_to_vram+0x14e/0x580 [amdgpu]
svm_range_set_attr+0xe34/0x11a0 [amdgpu]
kfd_ioctl+0x271/0x4e0 [amdgpu]
? kfd_ioctl_set_xnack_mode+0xd0/0xd0 [amdgpu]
__x64_sys_ioctl+0x92/0xd0
Signed-off-by: James Zhu <James.Zhu@xxxxxxx>
---
drivers/gpu/drm/amd/amdkfd/kfd_svm.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
index 113fd11aa96e..9f2d48ade7fa 100644
--- a/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
+++ b/drivers/gpu/drm/amd/amdkfd/kfd_svm.c
@@ -3573,6 +3573,7 @@ svm_range_set_attr(struct kfd_process *p,
struct mm_struct *mm,
r = svm_range_trigger_migration(mm, prange, &migrated);
if (r)
goto out_unlock_range;
+ schedule();
I'm not sure that unconditionally scheduling here in every loop
iteration is a good solution. This could lead to performance
degradation when there are many small ranges. I think a better option
is to call cond_resched. That would only reschedule only "if
necessary", though I haven't quite figured out the criteria for
rescheduling being necessary.
[JZ] you are right, small range will sacrifice performance. but
cond_resched has no guarantee to remove RCU stall CPU completely.
Maybe we add own condition check here based on accumulated prange
which ls processed.
Regards,
Felix
if (migrated && (!p->xnack_enabled ||
(prange->flags &
KFD_IOCTL_SVM_FLAG_GPU_ALWAYS_MAPPED)) &&