On 2/12/2025 10:32 PM, Davidlohr Bueso wrote:
On Sun, 01 Dec 2024, Raghavendra K T wrote:
6. Holding PTE lock before migration.
fyi I tried testing this series with 'perf-bench numa mem' and got a
soft lockup,
unable to take the PTL (and lost the machine to debug further atm), ie:
[ 3852.217675] CPU: 127 UID: 0 PID: 12537 Comm: watch-numa-sche Tainted:
G D L 6.14.0-rc2-kmmscand-v1+ #3
[ 3852.217677] Tainted: [D]=DIE, [L]=SOFTLOCKUP
[ 3852.217678] RIP: 0010:native_queued_spin_lock_slowpath+0x64/0x290
[ 3852.217683] Code: 77 7b f0 0f ba 2b 08 0f 92 c2 8b 03 0f b6 d2 c1 e2
08 30 e4 09 d0 3d ff 00 00 00 77 57 85 c0 74 10 0f b6 03 84 c0 74 09 f3
90 <0f> b6 03 84 c0 75 f7 b8 01 00 00 00 66 89 03 5b 5d 41 5c 41 5d c3
[ 3852.217684] RSP: 0018:ff274259b3c9f988 EFLAGS: 00000202
[ 3852.217685] RAX: 0000000000000001 RBX: ffbd2efd8c08c9a8 RCX:
000ffffffffff000
[ 3852.217686] RDX: 0000000000000000 RSI: 0000000000000001 RDI:
ffbd2efd8c08c9a8
[ 3852.217687] RBP: ff161328422c1328 R08: ff274259b3c9fb90 R09:
ff161328422c1000
[ 3852.217688] R10: 00000000ffffffff R11: 0000000000000004 R12:
00007f52cca00000
[ 3852.217688] R13: ff274259b3c9fa00 R14: ff16132842326000 R15:
ff161328422c1328
[ 3852.217689] FS: 00007f32b6f92b80(0000) GS:ff161423bfd80000(0000)
knlGS:0000000000000000
[ 3852.217691] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3852.217692] CR2: 0000564ddbf68008 CR3: 00000080a81cc005 CR4:
0000000000773ef0
[ 3852.217693] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[ 3852.217694] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7:
0000000000000400
[ 3852.217694] PKRU: 55555554
[ 3852.217695] Call Trace:
[ 3852.217696] <IRQ>
[ 3852.217697] ? watchdog_timer_fn+0x21b/0x2a0
[ 3852.217699] ? __pfx_watchdog_timer_fn+0x10/0x10
[ 3852.217702] ? __hrtimer_run_queues+0x10f/0x2a0
[ 3852.217704] ? hrtimer_interrupt+0xfb/0x240
[ 3852.217706] ? __sysvec_apic_timer_interrupt+0x4e/0x110
[ 3852.217709] ? sysvec_apic_timer_interrupt+0x68/0x90
[ 3852.217712] </IRQ>
[ 3852.217712] <TASK>
[ 3852.217713] ? asm_sysvec_apic_timer_interrupt+0x16/0x20
[ 3852.217717] ? native_queued_spin_lock_slowpath+0x64/0x290
[ 3852.217720] _raw_spin_lock+0x25/0x30
[ 3852.217723] __pte_offset_map_lock+0x9a/0x110
[ 3852.217726] gather_pte_stats+0x1e3/0x2c0
[ 3852.217730] walk_pgd_range+0x528/0xbb0
[ 3852.217733] __walk_page_range+0x71/0x1d0
[ 3852.217736] walk_page_vma+0x98/0xf0
[ 3852.217738] show_numa_map+0x11a/0x3a0
[ 3852.217741] seq_read_iter+0x2a6/0x470
[ 3852.217745] seq_read+0x12b/0x170
[ 3852.217748] vfs_read+0xe0/0x370
[ 3852.217751] ? syscall_exit_to_user_mode+0x49/0x210
[ 3852.217755] ? do_syscall_64+0x8a/0x190
[ 3852.217758] ksys_read+0x6a/0xe0
[ 3852.217762] do_syscall_64+0x7e/0x190
[ 3852.217765] ? __memcg_slab_free_hook+0xd4/0x120
[ 3852.217768] ? __x64_sys_close+0x38/0x80
[ 3852.217771] ? kmem_cache_free+0x3bf/0x3e0
[ 3852.217774] ? syscall_exit_to_user_mode+0x49/0x210
[ 3852.217777] ? do_syscall_64+0x8a/0x190
[ 3852.217780] ? do_syscall_64+0x8a/0x190
[ 3852.217783] ? __irq_exit_rcu+0x3e/0xe0
[ 3852.217785] entry_SYSCALL_64_after_hwframe+0x76/0x7e
Hello David,
Thanks for reporting, details. Reproducer information helps me
to stabilize the code quickly. Micro-benchmark I used did not show any
issues. I will add PTL lock and also check the issue from my side..
(with multiple scanning threads, it could cause even more issues because
of more migration pressure, wondering if I should go ahead with more
stabilized single thread scanning version in the coming post)
Thanks and Regards
- Raghu