Hi, On Mon, 2025-01-13 at 17:28 +0800, Ming Lei wrote: > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote: > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote: > > > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote: > > > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote: > > > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström > > > > > wrote: > > > > > > Ming, Others > > > > > > > > > > #2: > > [ 5.595482] > > ====================================================== > > [ 5.596353] WARNING: possible circular locking dependency > > detected > > [ 5.597231] 6.13.0-rc6+ #122 Tainted: G U > > [ 5.598182] ---------------------------------------------------- > > -- > > [ 5.599149] (udev-worker)/867 is trying to acquire lock: > > [ 5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4}, > > at: > > kernfs_remove+0x31/0x50 > > [ 5.600987] > > but task is already holding lock: > > [ 5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}- > > {0:0}, at: blk_mq_freeze_queue+0x12/0x20 > > [ 5.603033] > > which lock already depends on the new lock. > > > > [ 5.603034] > > the existing dependency chain (in reverse order) is: > > [ 5.603035] > > -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}: > > [ 5.603038] blk_alloc_queue+0x319/0x350 > > [ 5.603041] blk_mq_alloc_queue+0x63/0xd0 > > The above one is solved in for-6.14/block of block tree: > > block: track queue dying state automatically for modeling > queue freeze lockdep > > q->q_usage_counter(io) is killed because disk isn't up yet. > > If you apply the noio patch against for-6.1/block, the two splats > should > have disappeared. If not, please post lockdep log. That above dependency path is the lockdep priming I suggested, which establishes the reclaim -> q->q_usage_counter(io) locking order. A splat without that priming would look slightly different and won't occur until memory is actually exhausted. But it *will* occur. That's why I suggested using the priming to catch all fs_reclaim- >q_usage_counter(io) violations early, perhaps already at system boot, and anybody accidently adding a GFP_KERNEL memory allocation under the q_usage_counter(io) lock would get a notification as soon as that allocation happens. The actual deadlock sequence is because kernfs_rwsem is taken under q_usage_counter(io): (excerpt from the report [a]). If the priming is removed, the splat doesn't happen until reclaim, and will instead look like [b]. Thanks, Thomas [a] [ 5.603115] Chain exists of: &root->kernfs_rwsem --> fs_reclaim --> &q- >q_usage_counter(io)#3 [ 5.603117] Possible unsafe locking scenario: [ 5.603117] CPU0 CPU1 [ 5.603117] ---- ---- [ 5.603118] lock(&q->q_usage_counter(io)#3); [ 5.603119] lock(fs_reclaim); [ 5.603119] lock(&q- >q_usage_counter(io)#3); [ 5.603120] lock(&root->kernfs_rwsem); [ 5.603121] *** DEADLOCK *** [ 5.603121] 6 locks held by (udev-worker)/867: [ 5.603122] #0: ffff9211c16dd420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x72/0xf0 [ 5.603125] #1: ffff9211e28f3e88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x121/0x240 [ 5.603128] #2: ffff921203524f28 (kn->active#101){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x12a/0x240 [ 5.603131] #3: ffff9211e86f46d0 (&q->sysfs_lock){+.+.}-{4:4}, at: queue_attr_store+0x12b/0x180 [ 5.603133] #4: ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}- {0:0}, at: blk_mq_freeze_queue+0x12/0x20 [ 5.603136] #5: ffff9211e86f41d8 (&q- >q_usage_counter(queue)#3){++++}-{0:0}, at: blk_mq_freeze_queue+0x12/0x20 [ 5.603139] stack backtrace: [ 5.603140] CPU: 4 UID: 0 PID: 867 Comm: (udev-worker) Tainted: G U 6.13.0-rc6+ #122 [ 5.603142] Tainted: [U]=USER [ 5.603142] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [ 5.603143] Call Trace: [ 5.603144] <TASK> [ 5.603146] dump_stack_lvl+0x6e/0xa0 [ 5.603148] print_circular_bug.cold+0x178/0x1be [ 5.603151] check_noncircular+0x148/0x160 [ 5.603154] __lock_acquire+0x1339/0x2180 [ 5.603156] lock_acquire+0xd0/0x2e0 [ 5.603158] ? kernfs_remove+0x31/0x50 [ 5.603160] ? sysfs_remove_dir+0x32/0x60 [ 5.603162] ? lock_release+0xd2/0x2a0 [ 5.603164] down_write+0x2e/0xb0 [ 5.603165] ? kernfs_remove+0x31/0x50 [ 5.603166] kernfs_remove+0x31/0x50 [ 5. [b] [157.543591] ====================================================== [ 157.543778] WARNING: possible circular locking dependency detected [ 157.543787] 6.13.0-rc6+ #123 Tainted: G U [ 157.543796] ------------------------------------------------------ [ 157.543805] git/2856 is trying to acquire lock: [ 157.543812] ffff98b6bb882f10 (&q->q_usage_counter(io)#2){++++}- {0:0}, at: __submit_bio+0x80/0x220 [ 157.543830] but task is already holding lock: [ 157.543839] ffffffffad65e1c0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x348/0xea0 [ 157.543855] which lock already depends on the new lock. [ 157.543867] the existing dependency chain (in reverse order) is: [ 157.543878] -> #2 (fs_reclaim){+.+.}-{0:0}: [ 157.543888] fs_reclaim_acquire+0x9d/0xd0 [ 157.543896] kmem_cache_alloc_lru_noprof+0x57/0x3f0 [ 157.543906] alloc_inode+0x97/0xc0 [ 157.543913] iget_locked+0x141/0x310 [ 157.543921] kernfs_get_inode+0x1a/0xf0 [ 157.543929] kernfs_get_tree+0x17b/0x2c0 [ 157.543938] sysfs_get_tree+0x1a/0x40 [ 157.543945] vfs_get_tree+0x29/0xe0 [ 157.543953] path_mount+0x49a/0xbd0 [ 157.543960] __x64_sys_mount+0x119/0x150 [ 157.543968] do_syscall_64+0x95/0x180 [ 157.543977] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 157.543986] -> #1 (&root->kernfs_rwsem){++++}-{4:4}: [ 157.543997] down_write+0x2e/0xb0 [ 157.544004] kernfs_remove+0x31/0x50 [ 157.544012] __kobject_del+0x2e/0x90 [ 157.544020] kobject_del+0x13/0x30 [ 157.544026] elevator_switch+0x44/0x2e0 [ 157.544034] elv_iosched_store+0x174/0x1e0 [ 157.544043] queue_attr_store+0x165/0x1b0 [ 157.544050] kernfs_fop_write_iter+0x168/0x240 [ 157.544059] vfs_write+0x2b2/0x540 [ 157.544066] ksys_write+0x72/0xf0 [ 157.544073] do_syscall_64+0x95/0x180 [ 157.544081] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 157.544090] -> #0 (&q->q_usage_counter(io)#2){++++}-{0:0}: [ 157.544102] __lock_acquire+0x1339/0x2180 [ 157.544110] lock_acquire+0xd0/0x2e0 [ 157.544118] blk_mq_submit_bio+0x88b/0xb60 [ 157.544127] __submit_bio+0x80/0x220 [ 157.544135] submit_bio_noacct_nocheck+0x324/0x420 [ 157.544144] swap_writepage+0x399/0x580 [ 157.544152] pageout+0x129/0x2d0 [ 157.544160] shrink_folio_list+0x5a0/0xd80 [ 157.544168] evict_folios+0x27d/0x7b0 [ 157.544175] try_to_shrink_lruvec+0x21b/0x2b0 [ 157.544183] shrink_one+0x102/0x1f0 [ 157.544191] shrink_node+0xb8e/0x1300 [ 157.544198] do_try_to_free_pages+0xb3/0x580 [ 157.544206] try_to_free_pages+0xfa/0x2a0 [ 157.544214] __alloc_pages_slowpath.constprop.0+0x36f/0xea0 [ 157.544224] __alloc_pages_noprof+0x34c/0x390 [ 157.544233] alloc_pages_mpol_noprof+0xd7/0x1c0 [ 157.544241] pipe_write+0x3fc/0x7f0 [ 157.544574] vfs_write+0x401/0x540 [ 157.544917] ksys_write+0xd1/0xf0 [ 157.545246] do_syscall_64+0x95/0x180 [ 157.545576] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 157.545909] other info that might help us debug this: [ 157.546879] Chain exists of: &q->q_usage_counter(io)#2 --> &root->kernfs_rwsem --> fs_reclaim [ 157.547849] Possible unsafe locking scenario: [ 157.548483] CPU0 CPU1 [ 157.548795] ---- ---- [ 157.549098] lock(fs_reclaim); [ 157.549400] lock(&root- >kernfs_rwsem); [ 157.549705] lock(fs_reclaim); [ 157.550011] rlock(&q->q_usage_counter(io)#2); [ 157.550316] *** DEADLOCK *** [ 157.551194] 2 locks held by git/2856: [ 157.551490] #0: ffff98b6a221e068 (&pipe->mutex){+.+.}-{4:4}, at: pipe_write+0x5a/0x7f0 [ 157.551798] #1: ffffffffad65e1c0 (fs_reclaim){+.+.}-{0:0}, at: __alloc_pages_slowpath.constprop.0+0x348/0xea0 [ 157.552115] stack backtrace: [ 157.552734] CPU: 5 UID: 1000 PID: 2856 Comm: git Tainted: G U 6.13.0-rc6+ #123 [ 157.553060] Tainted: [U]=USER [ 157.553383] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [ 157.553718] Call Trace: [ 157.554054] <TASK> [ 157.554389] dump_stack_lvl+0x6e/0xa0 [ 157.554725] print_circular_bug.cold+0x178/0x1be [ 157.555064] check_noncircular+0x148/0x160 [ 157.555408] ? __pfx_stack_trace_consume_entry+0x10/0x10 [ 157.555747] ? unwind_get_return_address+0x23/0x40 [ 157.556085] __lock_acquire+0x1339/0x2180 [ 157.556425] lock_acquire+0xd0/0x2e0 [ 157.556761] ? __submit_bio+0x80/0x220 [ 157.557110] ? blk_mq_submit_bio+0x860/0xb60 [ 157.557447] ? lock_release+0xd2/0x2a0 [ 157.557784] blk_mq_submit_bio+0x88b/0xb60 [ 157.558137] ? __submit_bio+0x80/0x220 [ 157.558476] __submit_bio+0x80/0x220 [ 157.558828] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 157.559166] ? submit_bio_noacct_nocheck+0x324/0x420 [ 157.559504] submit_bio_noacct_nocheck+0x324/0x420 [ 157.559863] swap_writepage+0x399/0x580 [ 157.560205] pageout+0x129/0x2d0 [ 157.560542] shrink_folio_list+0x5a0/0xd80 [ 157.560879] ? evict_folios+0x25d/0x7b0 [ 157.561212] evict_folios+0x27d/0x7b0 [ 157.561546] try_to_shrink_lruvec+0x21b/0x2b0 [ 157.561890] shrink_one+0x102/0x1f0 [ 157.562222] shrink_node+0xb8e/0x1300 [ 157.562554] ? shrink_node+0x9c1/0x1300 [ 157.562915] ? shrink_node+0xb64/0x1300 [ 157.563245] ? do_try_to_free_pages+0xb3/0x580 [ 157.563576] do_try_to_free_pages+0xb3/0x580 [ 157.563922] ? lock_release+0xd2/0x2a0 [ 157.564252] try_to_free_pages+0xfa/0x2a0 [ 157.564583] __alloc_pages_slowpath.constprop.0+0x36f/0xea0 [ 157.564946] ? lock_release+0xd2/0x2a0 [ 157.565279] __alloc_pages_noprof+0x34c/0x390 [ 157.565613] alloc_pages_mpol_noprof+0xd7/0x1c0 [ 157.565952] pipe_write+0x3fc/0x7f0 [ 157.566283] vfs_write+0x401/0x540 [ 157.566615] ksys_write+0xd1/0xf0 [ 157.566980] do_syscall_64+0x95/0x180 [ 157.567312] ? vfs_write+0x401/0x540 [ 157.567642] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 157.568001] ? syscall_exit_to_user_mode+0x97/0x290 [ 157.568331] ? do_syscall_64+0xa1/0x180 [ 157.568658] ? do_syscall_64+0xa1/0x180 [ 157.569012] ? syscall_exit_to_user_mode+0x97/0x290 [ 157.569337] ? do_syscall_64+0xa1/0x180 [ 157.569658] ? do_user_addr_fault+0x397/0x720 [ 157.569980] ? trace_hardirqs_off+0x4b/0xc0 [ 157.570300] ? clear_bhb_loop+0x45/0xa0 [ 157.570621] ? clear_bhb_loop+0x45/0xa0 [ 157.570968] ? clear_bhb_loop+0x45/0xa0 [ 157.571286] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 157.571605] RIP: 0033:0x7fdf1ec2d484 [ 157.571966] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 45 9c 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 [ 157.572322] RSP: 002b:00007ffd0eb6d068 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 157.572692] RAX: ffffffffffffffda RBX: 0000000000000331 RCX: 00007fdf1ec2d484 [ 157.573093] RDX: 0000000000000331 RSI: 000055693fe2d660 RDI: 0000000000000001 [ 157.573470] RBP: 00007ffd0eb6d090 R08: 000055693fdc6010 R09: 0000000000000007 [ 157.573875] R10: 0000556941b97c70 R11: 0000000000000202 R12: 0000000000000331 [ 157.574249] R13: 000055693fe2d660 R14: 00007fdf1ed305c0 R15: 00007fdf1ed2de80 [ 157.574621] </TASK> > > Thanks, > Ming >