On Wed, Oct 30, 2024 at 05:50:15PM +0800, Ming Lei wrote: > On Wed, Oct 30, 2024 at 4:51 PM Lai, Yi <yi1.lai@xxxxxxxxxxxxxxx> wrote: > > > > On Wed, Oct 30, 2024 at 03:13:09PM +0800, Ming Lei wrote: > > > On Wed, Oct 30, 2024 at 02:45:03PM +0800, Lai, Yi wrote: > ... > > > > > > It should be addressed by the following patch: > > > > > > https://lore.kernel.org/linux-block/ZyEGLdg744U_xBjp@fedora/ > > > > > > > I have applied proposed fix patch on top of next-20241029. Issue can > > still be reproduced. > > > > It seems the dependency chain is different from Marek's log and mine. > > Can you post the new log since q->q_usage_counter(io)->fs_reclaim from > blk_mq_init_sched is cut down by the patch? > New possible deadlock log after patch applied: [ 52.485023] repro: page allocation failure: order:1, mode:0x10cc0(GFP_KERNEL|__GFP_NORETRY), nodemask=(null),cpuset=/,mems0 [ 52.486074] CPU: 1 UID: 0 PID: 635 Comm: repro Not tainted 6.12.0-rc5-next-20241029-kvm-dirty #6 [ 52.486752] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/014 [ 52.487616] Call Trace: [ 52.487820] <TASK> [ 52.488001] dump_stack_lvl+0x121/0x150 [ 52.488345] dump_stack+0x19/0x20 [ 52.488616] warn_alloc+0x218/0x350 [ 52.488913] ? __pfx_warn_alloc+0x10/0x10 [ 52.489263] ? __alloc_pages_direct_compact+0x130/0xa10 [ 52.489699] ? __pfx___alloc_pages_direct_compact+0x10/0x10 [ 52.490151] ? __drain_all_pages+0x27b/0x480 [ 52.490522] __alloc_pages_slowpath.constprop.0+0x14d6/0x21e0 [ 52.491018] ? __pfx___alloc_pages_slowpath.constprop.0+0x10/0x10 [ 52.491519] ? lock_is_held_type+0xef/0x150 [ 52.491875] ? __pfx_get_page_from_freelist+0x10/0x10 [ 52.492291] ? lock_acquire+0x80/0xb0 [ 52.492619] __alloc_pages_noprof+0x5d4/0x6f0 [ 52.492992] ? __pfx___alloc_pages_noprof+0x10/0x10 [ 52.493405] ? __sanitizer_cov_trace_switch+0x58/0xa0 [ 52.493830] ? policy_nodemask+0xf9/0x450 [ 52.494169] alloc_pages_mpol_noprof+0x30a/0x580 [ 52.494561] ? __pfx_alloc_pages_mpol_noprof+0x10/0x10 [ 52.494982] ? sysvec_apic_timer_interrupt+0x6a/0xd0 [ 52.495396] ? asm_sysvec_apic_timer_interrupt+0x1f/0x30 [ 52.495845] alloc_pages_noprof+0xa9/0x180 [ 52.496201] kimage_alloc_pages+0x79/0x240 [ 52.496558] kimage_alloc_control_pages+0x1cb/0xa60 [ 52.496982] ? __pfx_kimage_alloc_control_pages+0x10/0x10 [ 52.497437] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30 [ 52.497897] do_kexec_load+0x3a6/0x8e0 [ 52.498228] ? __pfx_do_kexec_load+0x10/0x10 [ 52.498593] ? __sanitizer_cov_trace_const_cmp8+0x1c/0x30 [ 52.499035] ? _copy_from_user+0xb6/0xf0 [ 52.499371] __x64_sys_kexec_load+0x1cc/0x240 [ 52.499740] x64_sys_call+0xf0f/0x20d0 [ 52.500055] do_syscall_64+0x6d/0x140 [ 52.500367] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 52.500778] RIP: 0033:0x7f310423ee5d [ 52.501077] Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c88 [ 52.502494] RSP: 002b:00007fffcecca558 EFLAGS: 00000207 ORIG_RAX: 00000000000000f6 [ 52.503087] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007f310423ee5d [ 52.503644] RDX: 0000000020000040 RSI: 0000000000000009 RDI: 0000000000000000 [ 52.504198] RBP: 00007fffcecca560 R08: 00007fffcecc9fd0 R09: 00007fffcecca590 [ 52.504767] R10: 0000000000000000 R11: 0000000000000207 R12: 00007fffcecca6d8 [ 52.505345] R13: 0000000000401c72 R14: 0000000000403e08 R15: 00007f3104469000 [ 52.505949] </TASK> [ 52.506239] Mem-Info: [ 52.506449] active_anon:119 inactive_anon:14010 isolated_anon:0 [ 52.506449] active_file:17895 inactive_file:87 isolated_file:0 [ 52.506449] unevictable:0 dirty:15 writeback:0 [ 52.506449] slab_reclaimable:6957 slab_unreclaimable:20220 [ 52.506449] mapped:11598 shmem:1150 pagetables:766 [ 52.506449] sec_pagetables:0 bounce:0 [ 52.506449] kernel_misc_reclaimable:0 [ 52.506449] free:13776 free_pcp:99 free_cma:0 [ 52.509456] Node 0 active_anon:476kB inactive_anon:56040kB active_file:71580kB inactive_file:348kB unevictable:0kB isolateo [ 52.511881] Node 0 DMA free:440kB boost:0kB min:440kB low:548kB high:656kB reserved_highatomic:0KB active_anon:0kB inactivB [ 52.513883] lowmem_reserve[]: 0 1507 0 0 0 [ 52.514269] Node 0 DMA32 free:54664kB boost:0kB min:44612kB low:55764kB high:66916kB reserved_highatomic:0KB active_anon:4B [ 52.516485] lowmem_reserve[]: 0 0 0 0 0 [ 52.516831] Node 0 DMA: 0*4kB 1*8kB (U) 1*16kB (U) 1*32kB (U) 0*64kB 1*128kB (U) 1*256kB (U) 0*512kB 0*1024kB 0*2048kB 0*4B [ 52.517895] Node 0 DMA32: 2970*4kB (UME) 1123*8kB (UME) 532*16kB (UME) 280*32kB (UM) 126*64kB (UM) 27*128kB (UME) 9*256kB B [ 52.519279] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=1048576kB [ 52.519387] [ 52.519971] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB [ 52.520138] ====================================================== [ 52.520805] 19113 total pagecache pages [ 52.521702] WARNING: possible circular locking dependency detected [ 52.522016] 0 pages in swap cache [ 52.522022] Free swap = 124984kB [ 52.522027] Total swap = 124996kB [ 52.523720] 6.12.0-rc5-next-20241029-kvm-dirty #6 Not tainted [ 52.523741] ------------------------------------------------------ [ 52.524059] 524158 pages RAM [ 52.525050] kswapd0/56 is trying to acquire lock: [ 52.525452] 0 pages HighMem/MovableOnly [ 52.525461] 129765 pages reserved [ 52.525465] 0 pages cma reserved [ 52.525469] 0 pages hwpoisoned [ 52.527163] ffff8880104374e8 (&q->q_usage_counter(io)#25){++++}-{0:0}, at: __submit_bio+0x39f/0x550 [ 52.532396] [ 52.532396] but task is already holding lock: [ 52.533293] ffffffff872322a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xb0f/0x1500 [ 52.534508] [ 52.534508] which lock already depends on the new lock. [ 52.534508] [ 52.535723] [ 52.535723] the existing dependency chain (in reverse order) is: [ 52.536818] [ 52.536818] -> #2 (fs_reclaim){+.+.}-{0:0}: [ 52.537705] lock_acquire+0x80/0xb0 [ 52.538337] fs_reclaim_acquire+0x116/0x160 [ 52.539076] blk_mq_alloc_and_init_hctx+0x4df/0x1200 [ 52.539906] blk_mq_realloc_hw_ctxs+0x4cf/0x610 [ 52.540676] blk_mq_init_allocated_queue+0x3da/0x11b0 [ 52.541547] blk_mq_alloc_queue+0x22c/0x300 [ 52.542279] __blk_mq_alloc_disk+0x34/0x100 [ 52.543011] loop_add+0x4c9/0xbd0 [ 52.543622] loop_init+0x133/0x1a0 [ 52.544248] do_one_initcall+0x114/0x5d0 [ 52.544954] kernel_init_freeable+0xab0/0xeb0 [ 52.545732] kernel_init+0x28/0x2f0 [ 52.546366] ret_from_fork+0x56/0x90 [ 52.547009] ret_from_fork_asm+0x1a/0x30 [ 52.547698] [ 52.547698] -> #1 (&q->sysfs_lock){+.+.}-{4:4}: [ 52.548625] lock_acquire+0x80/0xb0 [ 52.549276] __mutex_lock+0x17c/0x1540 [ 52.549958] mutex_lock_nested+0x1f/0x30 [ 52.550664] queue_attr_store+0xea/0x180 [ 52.551360] sysfs_kf_write+0x11f/0x180 [ 52.552036] kernfs_fop_write_iter+0x40e/0x630 [ 52.552808] vfs_write+0xc59/0x1140 [ 52.553446] ksys_write+0x14f/0x290 [ 52.554068] __x64_sys_write+0x7b/0xc0 [ 52.554728] x64_sys_call+0x1685/0x20d0 [ 52.555397] do_syscall_64+0x6d/0x140 [ 52.556029] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 52.556865] [ 52.556865] -> #0 (&q->q_usage_counter(io)#25){++++}-{0:0}: [ 52.557963] __lock_acquire+0x2ff8/0x5d60 [ 52.558667] lock_acquire.part.0+0x142/0x390 [ 52.559427] lock_acquire+0x80/0xb0 [ 52.560057] blk_mq_submit_bio+0x1cbe/0x2590 [ 52.560801] __submit_bio+0x39f/0x550 [ 52.561473] submit_bio_noacct_nocheck+0x647/0xcc0 [ 52.562285] submit_bio_noacct+0x620/0x1e00 [ 52.563017] submit_bio+0xce/0x480 [ 52.563637] __swap_writepage+0x2f1/0xdf0 [ 52.564349] swap_writepage+0x464/0xbc0 [ 52.565022] shmem_writepage+0xdeb/0x1340 [ 52.565745] pageout+0x3bc/0x9b0 [ 52.566353] shrink_folio_list+0x16b9/0x3b60 [ 52.567104] shrink_lruvec+0xd78/0x2790 [ 52.567794] shrink_node+0xb29/0x2870 [ 52.568454] balance_pgdat+0x9c2/0x1500 [ 52.569142] kswapd+0x765/0xe00 [ 52.569741] kthread+0x35a/0x470 [ 52.570340] ret_from_fork+0x56/0x90 [ 52.570993] ret_from_fork_asm+0x1a/0x30 [ 52.571696] [ 52.571696] other info that might help us debug this: [ 52.571696] [ 52.572904] Chain exists of: [ 52.572904] &q->q_usage_counter(io)#25 --> &q->sysfs_lock --> fs_reclaim [ 52.572904] [ 52.574631] Possible unsafe locking scenario: [ 52.574631] [ 52.575547] CPU0 CPU1 [ 52.576246] ---- ---- [ 52.576942] lock(fs_reclaim); [ 52.577467] lock(&q->sysfs_lock); [ 52.578382] lock(fs_reclaim); [ 52.579250] rlock(&q->q_usage_counter(io)#25); [ 52.579974] [ 52.579974] *** DEADLOCK *** [ 52.579974] [ 52.580866] 1 lock held by kswapd0/56: [ 52.581459] #0: ffffffff872322a0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xb0f/0x1500 [ 52.582731] [ 52.582731] stack backtrace: [ 52.583404] CPU: 0 UID: 0 PID: 56 Comm: kswapd0 Not tainted 6.12.0-rc5-next-20241029-kvm-dirty #6 [ 52.584735] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/014 [ 52.586439] Call Trace: [ 52.586836] <TASK> [ 52.587190] dump_stack_lvl+0xea/0x150 [ 52.587753] dump_stack+0x19/0x20 [ 52.588253] print_circular_bug+0x47f/0x750 [ 52.588872] check_noncircular+0x2f4/0x3e0 [ 52.589492] ? __pfx_check_noncircular+0x10/0x10 [ 52.590180] ? lockdep_lock+0xd0/0x1d0 [ 52.590741] ? __pfx_lockdep_lock+0x10/0x10 [ 52.590790] kexec: Could not allocate control_code_buffer [ 52.591341] __lock_acquire+0x2ff8/0x5d60 [ 52.592365] ? __pfx___lock_acquire+0x10/0x10 [ 52.593087] ? __pfx_mark_lock.part.0+0x10/0x10 [ 52.593753] ? __kasan_check_read+0x15/0x20 [ 52.594366] lock_acquire.part.0+0x142/0x390 [ 52.594989] ? __submit_bio+0x39f/0x550 [ 52.595554] ? __pfx_lock_acquire.part.0+0x10/0x10 [ 52.596246] ? debug_smp_processor_id+0x20/0x30 [ 52.596900] ? rcu_is_watching+0x19/0xc0 [ 52.597484] ? trace_lock_acquire+0x139/0x1b0 [ 52.598118] lock_acquire+0x80/0xb0 [ 52.598633] ? __submit_bio+0x39f/0x550 [ 52.599191] blk_mq_submit_bio+0x1cbe/0x2590 [ 52.599805] ? __submit_bio+0x39f/0x550 [ 52.600361] ? __kasan_check_read+0x15/0x20 [ 52.600966] ? __pfx_blk_mq_submit_bio+0x10/0x10 [ 52.601632] ? __pfx_mark_lock.part.0+0x10/0x10 [ 52.602285] ? __this_cpu_preempt_check+0x21/0x30 [ 52.602968] ? __this_cpu_preempt_check+0x21/0x30 [ 52.603646] ? lock_release+0x441/0x870 [ 52.604207] __submit_bio+0x39f/0x550 [ 52.604742] ? __pfx___submit_bio+0x10/0x10 [ 52.605364] ? __this_cpu_preempt_check+0x21/0x30 [ 52.606045] ? seqcount_lockdep_reader_access.constprop.0+0xb4/0xd0 [ 52.606940] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30 [ 52.607707] ? kvm_clock_get_cycles+0x43/0x70 [ 52.608345] submit_bio_noacct_nocheck+0x647/0xcc0 [ 52.609045] ? __pfx_submit_bio_noacct_nocheck+0x10/0x10 [ 52.609820] ? __sanitizer_cov_trace_switch+0x58/0xa0 [ 52.610552] submit_bio_noacct+0x620/0x1e00 [ 52.611167] submit_bio+0xce/0x480 [ 52.611677] __swap_writepage+0x2f1/0xdf0 [ 52.612267] swap_writepage+0x464/0xbc0 [ 52.612837] shmem_writepage+0xdeb/0x1340 [ 52.613441] ? __pfx_shmem_writepage+0x10/0x10 [ 52.614090] ? __kasan_check_write+0x18/0x20 [ 52.614716] ? folio_clear_dirty_for_io+0xc1/0x600 [ 52.615403] pageout+0x3bc/0x9b0 [ 52.615894] ? __pfx_pageout+0x10/0x10 [ 52.616471] ? __pfx_folio_referenced_one+0x10/0x10 [ 52.617169] ? __pfx_folio_lock_anon_vma_read+0x10/0x10 [ 52.617918] ? __pfx_invalid_folio_referenced_vma+0x10/0x10 [ 52.618713] shrink_folio_list+0x16b9/0x3b60 [ 52.619346] ? __pfx_shrink_folio_list+0x10/0x10 [ 52.620021] ? __this_cpu_preempt_check+0x21/0x30 [ 52.620713] ? mark_lock.part.0+0xf3/0x17b0 [ 52.621339] ? isolate_lru_folios+0xcb1/0x1250 [ 52.621991] ? __pfx_mark_lock.part.0+0x10/0x10 [ 52.622655] ? __this_cpu_preempt_check+0x21/0x30 [ 52.623335] ? lock_release+0x441/0x870 [ 52.623900] ? __this_cpu_preempt_check+0x21/0x30 [ 52.624573] ? _raw_spin_unlock_irq+0x2c/0x60 [ 52.625204] ? lockdep_hardirqs_on+0x89/0x110 [ 52.625848] shrink_lruvec+0xd78/0x2790 [ 52.626422] ? __pfx_shrink_lruvec+0x10/0x10 [ 52.627040] ? __this_cpu_preempt_check+0x21/0x30 [ 52.627729] ? __this_cpu_preempt_check+0x21/0x30 [ 52.628423] ? trace_lock_acquire+0x139/0x1b0 [ 52.629061] ? trace_lock_acquire+0x139/0x1b0 [ 52.629752] shrink_node+0xb29/0x2870 [ 52.630305] ? __pfx_shrink_node+0x10/0x10 [ 52.630899] ? pgdat_balanced+0x1d4/0x230 [ 52.631490] balance_pgdat+0x9c2/0x1500 [ 52.632055] ? __pfx_balance_pgdat+0x10/0x10 [ 52.632669] ? __this_cpu_preempt_check+0x21/0x30 [ 52.633380] kswapd+0x765/0xe00 [ 52.633861] ? __pfx_kswapd+0x10/0x10 [ 52.634393] ? local_clock_noinstr+0xb0/0xd0 [ 52.635015] ? __pfx_autoremove_wake_function+0x10/0x10 [ 52.635759] ? __sanitizer_cov_trace_const_cmp1+0x1e/0x30 [ 52.636525] ? __kthread_parkme+0x15d/0x230 [ 52.637134] kthread+0x35a/0x470 [ 52.637616] ? __pfx_kswapd+0x10/0x10 [ 52.638146] ? __pfx_kthread+0x10/0x10 [ 52.638693] ret_from_fork+0x56/0x90 [ 52.639227] ? __pfx_kthread+0x10/0x10 [ 52.639778] ret_from_fork_asm+0x1a/0x30 [ 52.640391] </TASK> > Thanks, > Ming >