On Wed, Oct 30, 2024 at 10:48PM +0100, Vlastimil Babka wrote: > On 10/30/24 22:05, Paul E. McKenney wrote: > > Hello! > > Hi! > > > The next-20241030 release gets the splat shown below when running > > scftorture in a preemptible kernel. This bisects to this commit: > > > > 560af5dc839e ("lockdep: Enable PROVE_RAW_LOCK_NESTING with PROVE_LOCKING") > > > > Except that all this is doing is enabling lockdep to find the problem. > > > > The obvious way to fix this is to make the kmem_cache structure's > > cpu_slab field's ->lock be a raw spinlock, but this might not be what > > we want for real-time response. > > But it's a local_lock, not spinlock and it's doing local_lock_irqsave(). I'm > confused what's happening here, the code has been like this for years now. > > > This can be reproduced deterministically as follows: > > > > tools/testing/selftests/rcutorture/bin/kvm.sh --torture scf --allcpus --duration 2 --configs PREEMPT --kconfig CONFIG_NR_CPUS=64 --memory 7G --trust-make --kasan --bootargs "scftorture.nthreads=64 torture.disable_onoff_at_boot csdlock_debug=1" > > > > I doubt that the number of CPUs or amount of memory makes any difference, > > but that is what I used. > > > > Thoughts? > > > > Thanx, Paul > > > > ------------------------------------------------------------------------ > > > > [ 35.659746] ============================= > > [ 35.659746] [ BUG: Invalid wait context ] > > [ 35.659746] 6.12.0-rc5-next-20241029 #57233 Not tainted > > [ 35.659746] ----------------------------- > > [ 35.659746] swapper/37/0 is trying to lock: > > [ 35.659746] ffff8881ff4bf2f0 (&c->lock){....}-{3:3}, at: put_cpu_partial+0x49/0x1b0 > > [ 35.659746] other info that might help us debug this: > > [ 35.659746] context-{2:2} > > [ 35.659746] no locks held by swapper/37/0. > > [ 35.659746] stack backtrace: > > [ 35.659746] CPU: 37 UID: 0 PID: 0 Comm: swapper/37 Not tainted 6.12.0-rc5-next-20241029 #57233 > > [ 35.659746] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 > > [ 35.659746] Call Trace: > > [ 35.659746] <IRQ> > > [ 35.659746] dump_stack_lvl+0x68/0xa0 > > [ 35.659746] __lock_acquire+0x8fd/0x3b90 > > [ 35.659746] ? start_secondary+0x113/0x210 > > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10 > > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10 > > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10 > > [ 35.659746] ? __pfx___lock_acquire+0x10/0x10 > > [ 35.659746] lock_acquire+0x19b/0x520 > > [ 35.659746] ? put_cpu_partial+0x49/0x1b0 > > [ 35.659746] ? __pfx_lock_acquire+0x10/0x10 > > [ 35.659746] ? __pfx_lock_release+0x10/0x10 > > [ 35.659746] ? lock_release+0x20f/0x6f0 > > [ 35.659746] ? __pfx_lock_release+0x10/0x10 > > [ 35.659746] ? lock_release+0x20f/0x6f0 > > [ 35.659746] ? kasan_save_track+0x14/0x30 > > [ 35.659746] put_cpu_partial+0x52/0x1b0 > > [ 35.659746] ? put_cpu_partial+0x49/0x1b0 > > [ 35.659746] ? __pfx_scf_handler_1+0x10/0x10 > > [ 35.659746] __flush_smp_call_function_queue+0x2d2/0x600 > > How did we even get to put_cpu_partial directly from flushing smp calls? > SLUB doesn't use them, it uses queue_work_on)_ for flushing and that > flushing doesn't involve put_cpu_partial() AFAIK. > > I think only slab allocation or free can lead to put_cpu_partial() that > would mean the backtrace is missing something. And that somebody does a slab > alloc/free from a smp callback, which I'd then assume isn't allowed? Tail-call optimization is hiding the caller. Compiling with -fno-optimize-sibling-calls exposes the caller. This gives the full picture: [ 40.321505] ============================= [ 40.322711] [ BUG: Invalid wait context ] [ 40.323927] 6.12.0-rc5-next-20241030-dirty #4 Not tainted [ 40.325502] ----------------------------- [ 40.326653] cpuhp/47/253 is trying to lock: [ 40.327869] ffff8881ff9bf2f0 (&c->lock){....}-{3:3}, at: put_cpu_partial+0x48/0x1a0 [ 40.330081] other info that might help us debug this: [ 40.331540] context-{2:2} [ 40.332305] 3 locks held by cpuhp/47/253: [ 40.333468] #0: ffffffffae6e6910 (cpu_hotplug_lock){++++}-{0:0}, at: cpuhp_thread_fun+0xe0/0x590 [ 40.336048] #1: ffffffffae6e9060 (cpuhp_state-down){+.+.}-{0:0}, at: cpuhp_thread_fun+0xe0/0x590 [ 40.338607] #2: ffff8881002a6948 (&root->kernfs_rwsem){++++}-{4:4}, at: kernfs_remove_by_name_ns+0x78/0x100 [ 40.341454] stack backtrace: [ 40.342291] CPU: 47 UID: 0 PID: 253 Comm: cpuhp/47 Not tainted 6.12.0-rc5-next-20241030-dirty #4 [ 40.344807] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 40.347482] Call Trace: [ 40.348199] <IRQ> [ 40.348827] dump_stack_lvl+0x6b/0xa0 [ 40.349899] dump_stack+0x10/0x20 [ 40.350850] __lock_acquire+0x900/0x4010 [ 40.360290] lock_acquire+0x191/0x4f0 [ 40.364850] put_cpu_partial+0x51/0x1a0 [ 40.368341] scf_handler+0x1bd/0x290 [ 40.370590] scf_handler_1+0x4e/0xb0 [ 40.371630] __flush_smp_call_function_queue+0x2dd/0x600 [ 40.373142] generic_smp_call_function_single_interrupt+0xe/0x20 [ 40.374801] __sysvec_call_function_single+0x50/0x280 [ 40.376214] sysvec_call_function_single+0x6c/0x80 [ 40.377543] </IRQ> [ 40.378142] <TASK> And scf_handler does indeed tail-call kfree: static void scf_handler(void *scfc_in) { [...] } else { kfree(scfcp); } }