On 7/15/21 6:34 PM, Mike Galbraith wrote: > Greetings crickets, > > Methinks he problem is the hole these patches opened only for RT. > > static void put_cpu_partial(struct kmem_cache *s, struct page *page, > int drain) > { > #ifdef CONFIG_SLUB_CPU_PARTIAL > struct page *oldpage; > int pages; > int pobjects; > > slub_get_cpu_ptr(s->cpu_slab); > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > That is an assertion that the stuff under it is preempt safe for RT and > ONLY RT. My box says the RT portion of that assertion is horse pookey. > What it does is let kmem_cache_free()/kfree() blast straight through > ___slab_alloc() critical sections, swapping out ->partial underneath > it. Sprinkling percpu fingerprint power about, I saw lots of > ___slab_alloc() preempts put_cpu_partial().. and vice versa. I don't > think it's a coincidence that ___slab_alloc() and __unfreeze_partials() > both explode trying to access page->freelist. You've seen the former, > here's one of the later. > > crash> bt -sx > PID: 18761 TASK: ffff88812fff0000 CPU: 0 COMMAND: "hackbench" > #0 [ffff88818f8ff980] machine_kexec+0x14f at ffffffff81051c8f > #1 [ffff88818f8ff9c8] __crash_kexec+0xd2 at ffffffff8111ef72 > #2 [ffff88818f8ffa88] crash_kexec+0x30 at ffffffff8111fd10 > #3 [ffff88818f8ffa98] oops_end+0xd3 at ffffffff810267e3 > #4 [ffff88818f8ffab8] exc_general_protection+0x195 at ffffffff8179fdb5 > #5 [ffff88818f8ffb50] asm_exc_general_protection+0x1e at ffffffff81800a0e > [exception RIP: __unfreeze_partials+156] Hm going back to this report, if that's a direct result of __slab_alloc preempting (interrupting) put_cpu_partial() then I think that's something that could happen even on !RT as all we do in the put_cpu_partial() there is disable preemption, and the allocation could come in irq handler. It would mean the whole idea of "mm, slub: detach percpu partial list in unfreeze_partials() using this_cpu_cmpxchg()" isn't safe. I'll have to ponder it. But we didn't see crashes on !RT yet. So could it be that it was still put_cpu_partial() preempting __slab_alloc() messing the partial list, but for some reason the put_cpu_partial() side crashed this time? > RIP: ffffffff81248bac RSP: ffff88818f8ffc00 RFLAGS: 00010202 > RAX: 0000000000200002 RBX: 0000000000200002 RCX: 000000017fffffff > RDX: 00000001ffffffff RSI: 0000000000000202 RDI: ffff888100040b80 > RBP: ffff88818f8ffca0 R8: ffff88818f9cba30 R9: 0000000000000001 > R10: ffff88818f8ffcc0 R11: 0000000000000000 R12: ffff888100043700 > R13: ffff888100040b80 R14: 00000001ffffffff R15: ffffea00046c0808 > ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018 > #6 [ffff88818f8ffcb8] put_cpu_partial+0x8e at ffffffff81248ece > #7 [ffff88818f8ffcd8] consume_skb+0x2b at ffffffff815eddeb > #8 [ffff88818f8ffce8] unix_stream_read_generic+0x788 at ffffffff8170b238 > #9 [ffff88818f8ffdc0] unix_stream_recvmsg+0x43 at ffffffff8170b433 > #10 [ffff88818f8ffdf8] sock_read_iter+0x104 at ffffffff815dd554 > #11 [ffff88818f8ffe68] new_sync_read+0x16a at ffffffff812674fa > #12 [ffff88818f8ffee8] vfs_read+0x1ae at ffffffff81269c8e > #13 [ffff88818f8fff00] ksys_read+0x40 at ffffffff8126a070 > #14 [ffff88818f8fff38] do_syscall_64+0x37 at ffffffff8179f5e7 > #15 [ffff88818f8fff50] entry_SYSCALL_64_after_hwframe+0x44 at ffffffff8180007c > RIP: 00007fbda4438f2e RSP: 00007fff3bf9d798 RFLAGS: 00000246 > RAX: ffffffffffffffda RBX: 00007fff3bf9e7a0 RCX: 00007fbda4438f2e > RDX: 0000000000001000 RSI: 00007fff3bf9d7a0 RDI: 0000000000000007 > RBP: 00007fff3bf9e800 R8: 00007fff3bf9e6e0 R9: 00007fbda4641010 > R10: 00007fbda4428028 R11: 0000000000000246 R12: 0000000000001000 > crash> dis __unfreeze_partials+156 > 0xffffffff81248bac <__unfreeze_partials+156>: lock cmpxchg16b 0x20(%r15) > crash> gdb list *__unfreeze_partials+156 > 0xffffffff81248bac is in __unfreeze_partials (mm/slub.c:475). > 470 if (!disable_irqs) > 471 lockdep_assert_irqs_disabled(); > 472 #if defined(CONFIG_HAVE_CMPXCHG_DOUBLE) && \ > 473 defined(CONFIG_HAVE_ALIGNED_STRUCT_PAGE) > 474 if (s->flags & __CMPXCHG_DOUBLE) { > 475 if (cmpxchg_double(&page->freelist, &page->counters, > 476 freelist_old, counters_old, > 477 freelist_new, counters_new)) > 478 return true; > 479 } else > crash> kmem ffffea00046c0808 > PAGE PHYSICAL MAPPING INDEX CNT FLAGS > ffffea00104b3000 412cc0000 0 c 2 8000000000003000 reserved,private > crash > > Regarding $subject, Lockdep thinks my hole plugging skills suck rocks > (shrug, I've given it plentiful cause) but that's ok, I sometimes think > the same of its bug reporting skills :) > > [ 2.459456] ============================================ > [ 2.459456] WARNING: possible recursive locking detected > [ 2.459457] 5.14.0.g79e92006-tip-rt #83 Tainted: G E > [ 2.459458] -------------------------------------------- > [ 2.459458] rcuc/7/56 is trying to acquire lock: > [ 2.459459] ffff88840edf01c0 (&(&c->lock)->lock){+.+.}-{0:0}, at: kfree+0x280/0x670 > [ 2.459466] > but task is already holding lock: > [ 2.459466] ffff88840edf4d60 (&(&c->lock)->lock){+.+.}-{0:0}, at: __slab_free+0x4d6/0x600 > [ 2.459469] > > live kernel snooping.... > > crash> ps | grep UN > crash> bt -sx 56 > PID: 56 TASK: ffff888100c19a40 CPU: 7 COMMAND: "rcuc/7" > #0 [ffff888100c63e40] __schedule+0x2eb at ffffffff818969fb > #1 [ffff888100c63ec8] schedule+0x3b at ffffffff8189723b > #2 [ffff888100c63ee0] smpboot_thread_fn+0x18c at ffffffff810a90dc > #3 [ffff888100c63f18] kthread+0x1af at ffffffff810a27bf > #4 [ffff888100c63f50] ret_from_fork+0x1f at ffffffff81001cbf > crash> task_struct ffff888100c19a40 | grep __state > __state = 1, > crash> gdb list *__slab_free+0x4d6 > 0xffffffff812923c6 is in __slab_free (mm/slub.c:2568). > 2563 /* > 2564 * partial array is full. Move the existing > 2565 * set to the per node partial list. > 2566 */ > 2567 local_lock(&s->cpu_slab->lock); > 2568 unfreeze_partials(s); > 2569 local_unlock(&s->cpu_slab->lock); > 2570 oldpage = NULL; > 2571 pobjects = 0; > 2572 pages = 0; > crash> gdb list *kfree+0x280 > 0xffffffff81292770 is in kfree (mm/slub.c:3442). > 3437 * __slab_free() as that wouldn't use the cpu freelist at all. > 3438 */ > 3439 void **freelist; > 3440 > 3441 local_lock(&s->cpu_slab->lock); > 3442 c = this_cpu_ptr(s->cpu_slab); > 3443 if (unlikely(page != c->page)) { > 3444 local_unlock(&s->cpu_slab->lock); > 3445 goto redo; > 3446 } > crash> > > Check, those are what's in the stacktrace below... but the allegedly > deadlocked for real task is very much alive, as is the rest of box. > > other info that might help us debug this: > [ 2.459470] Possible unsafe locking scenario: > > [ 2.459470] CPU0 > [ 2.459470] ---- > [ 2.459471] lock(&(&c->lock)->lock); > [ 2.459471] lock(&(&c->lock)->lock); > [ 2.459472] > *** DEADLOCK *** > > [ 2.459472] May be due to missing lock nesting notation > > [ 2.459472] 6 locks held by rcuc/7/56: > [ 2.459473] #0: ffff88840edd9820 ((softirq_ctrl.lock).lock){+.+.}-{2:2}, at: __local_bh_disable_ip+0xc3/0x190 > [ 2.459479] #1: ffffffff82382b80 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xd0 > [ 2.459484] #2: ffffffff82382b80 (rcu_read_lock){....}-{1:2}, at: __local_bh_disable_ip+0xa0/0x190 > [ 2.459487] #3: ffffffff82382ac0 (rcu_callback){....}-{0:0}, at: rcu_do_batch+0x195/0x520 > [ 2.459490] #4: ffff88840edf4d60 (&(&c->lock)->lock){+.+.}-{0:0}, at: __slab_free+0x4d6/0x600 > [ 2.459493] #5: ffffffff82382b80 (rcu_read_lock){....}-{1:2}, at: rt_spin_lock+0x5/0xd0 > [ 2.459496] > stack backtrace: > [ 2.459497] CPU: 7 PID: 56 Comm: rcuc/7 Tainted: G E 5.14.0.g79e92006-tip-rt #83 > [ 2.459498] Hardware name: MEDION MS-7848/MS-7848, BIOS M7848W08.20C 09/23/2013 > [ 2.459499] Call Trace: > [ 2.459501] dump_stack_lvl+0x44/0x57 > [ 2.459504] __lock_acquire+0xb7f/0x1ab0 > [ 2.459508] lock_acquire+0x2a6/0x340 > [ 2.459510] ? kfree+0x280/0x670 > [ 2.459513] ? __free_slab+0xa4/0x1b0 > [ 2.459515] rt_spin_lock+0x2a/0xd0 > [ 2.459516] ? kfree+0x280/0x670 somewhere in the multiverse lies a dead rcuc/7 > [ 2.459518] kfree+0x280/0x670 <== local_lock(&s->cpu_slab->lock); #2 > [ 2.459521] __free_slab+0xa4/0x1b0 ==> kfree(page_objcgs(page)) > [ 2.459523] __unfreeze_partials+0x1d8/0x330 ==> discard_slab(s, page); > [ 2.459526] ? _raw_spin_unlock_irqrestore+0x30/0x80 > [ 2.459530] ? __slab_free+0x4de/0x600 > [ 2.459532] __slab_free+0x4de/0x600 <== local_lock(&s->cpu_slab->lock); #1 > [ 2.459534] ? find_held_lock+0x2d/0x90 > [ 2.459536] ? kmem_cache_free+0x276/0x630 > [ 2.459539] ? rcu_do_batch+0x1c3/0x520 > [ 2.459540] ? kmem_cache_free+0x364/0x630 > [ 2.459542] kmem_cache_free+0x364/0x630 > [ 2.459544] ? rcu_do_batch+0x195/0x520 > [ 2.459546] rcu_do_batch+0x1c3/0x520 > [ 2.459547] ? rcu_do_batch+0x195/0x520 > [ 2.459550] ? rcu_cpu_kthread+0x7e/0x4b0 > [ 2.459552] ? rcu_cpu_kthread+0x57/0x4b0 > [ 2.459553] rcu_core+0x2c3/0x590 > [ 2.459555] ? rcu_cpu_kthread+0x78/0x4b0 > [ 2.459557] ? rcu_cpu_kthread+0x7e/0x4b0 > [ 2.459558] ? rcu_cpu_kthread+0x57/0x4b0 > [ 2.459560] rcu_cpu_kthread+0xc2/0x4b0 > [ 2.459562] ? smpboot_thread_fn+0x23/0x300 > [ 2.459565] smpboot_thread_fn+0x249/0x300 > [ 2.459567] ? smpboot_register_percpu_thread+0xe0/0xe0 > [ 2.459569] kthread+0x1af/0x1d0 > [ 2.459571] ? set_kthread_struct+0x40/0x40 > [ 2.459574] ret_from_fork+0x1f/0x30 >