On Mon, Feb 05, 2024 at 07:46:48AM -1000, Tejun Heo wrote: > On Mon, Feb 05, 2024 at 09:45:53AM -0800, Paul E. McKenney wrote: > > On Mon, Feb 05, 2024 at 10:25:15PM +0900, Sergey Senozhatsky wrote: > > > On (24/02/05 14:07), Petr Mladek wrote: > > > > > Good point, if it does recur, I could try it on bare metal. > > > > > > > > Please, me, John, and Sergey know if anyone see this again. I do not > > > > feel comfortable when there is problem which might make consoles calm. > > > > > > Agreed. > > > > > > > Bisection identified this commit: > > > > 5797b1c18919 ("workqueue: Implement system-wide nr_active enforcement for unbound workqueues") > > > > > > That commit triggered early boot use-after-free (per kasan) on > > > my system, which probably could derail some things. > > > > And enabling KASAN on next-20240130 got me that same KASAN report and > > also suppressed the misbehavior, which is not surprising given that > > KASAN quarantines free memory for some time. Plus enabling KASAN > > on recent -next does not trigger that KASAN report. > > > > So my guess is that we can attribute my oddball test failures to > > that use after free. But I will of course continue testing. > > Can someone paste the KASAN report? Here you go! Thanx, Paul ------------------------------------------------------------------------ [ 0.316453] ================================================================== [ 0.317646] BUG: KASAN: use-after-free in wq_update_node_max_active+0x123/0x810 [ 0.318851] Read of size 8 at addr ffff88802109d788 by task swapper/0/0 [ 0.319937] [ 0.320195] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 6.8.0-rc2-next-20240130 #7935 [ 0.321453] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014 [ 0.323299] Call Trace: [ 0.323700] <TASK> [ 0.324043] dump_stack_lvl+0x37/0x50 [ 0.324653] print_report+0xcb/0x620 [ 0.325249] ? wq_update_node_max_active+0x123/0x810 [ 0.326066] kasan_report+0xaf/0xe0 [ 0.326639] ? wq_update_node_max_active+0x123/0x810 [ 0.327455] kasan_check_range+0x39/0x1c0 [ 0.328119] wq_update_node_max_active+0x123/0x810 [ 0.328903] ? __pfx_mutex_lock+0x10/0x10 [ 0.329567] apply_wqattrs_commit+0x4e4/0xb80 [ 0.330289] ? __pfx_mutex_lock+0x10/0x10 [ 0.330946] apply_workqueue_attrs_locked+0x9e/0x110 [ 0.331764] alloc_workqueue+0xf76/0x18d0 [ 0.332432] ? __pfx_alloc_workqueue+0x10/0x10 [ 0.333189] ? kasan_unpoison+0x27/0x60 [ 0.333818] ? kasan_unpoison+0x27/0x60 [ 0.334455] ? __kasan_slab_alloc+0x30/0x70 [ 0.335147] ? __pfx_mutex_unlock+0x10/0x10 [ 0.335831] ? idr_alloc_u32+0x291/0x2c0 [ 0.336479] ? mutex_unlock+0x7e/0xd0 [ 0.337085] workqueue_init_early+0x69a/0xe70 [ 0.337800] ? __pfx_workqueue_init_early+0x10/0x10 [ 0.338605] ? kmem_cache_create_usercopy+0xcc/0x230 [ 0.339421] start_kernel+0x141/0x380 [ 0.340023] x86_64_start_reservations+0x18/0x30 [ 0.340788] x86_64_start_kernel+0xcf/0xe0 [ 0.341465] secondary_startup_64_no_verify+0x16d/0x17b [ 0.342334] </TASK> [ 0.342703] [ 0.342954] The buggy address belongs to the physical page: [ 0.343899] page:00000000a19a7ad3 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x2109d [ 0.345471] flags: 0x100000000000000(node=0|zone=1) [ 0.346297] page_type: 0xffffffff() [ 0.346882] raw: 0100000000000000 ffffea0000842748 ffffea0000842748 0000000000000000 [ 0.348184] raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 [ 0.349518] page dumped because: kasan: bad access detected [ 0.350457] [ 0.350706] Memory state around the buggy address: [ 0.351532] ffff88802109d680: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 0.352748] ffff88802109d700: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 0.353968] >ffff88802109d780: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 0.355221] ^ [ 0.355808] ffff88802109d800: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 0.357161] ffff88802109d880: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff [ 0.358439] ==================================================================