On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote: > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote: > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote: > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström wrote: > > > > Ming, Others > > > > > > > > On 6.13-rc6 I'm seeing a couple of lockdep splats which appear > > > > introduced by the commit > > > > > > > > f1be1788a32e ("block: model freeze & enter queue as lock for > > > > supporting > > > > lockdep") > > > > > > The freeze lock connects all kinds of sub-system locks, that is > > > why > > > we see lots of warnings after the commit is merged. > > > > > > ... > > > > > > > #1 > > > > [ 399.006581] > > > > ====================================================== > > > > [ 399.006756] WARNING: possible circular locking dependency > > > > detected > > > > [ 399.006767] 6.12.0-rc4+ #1 Tainted: G U N > > > > [ 399.006776] ------------------------------------------------ > > > > ---- > > > > -- > > > > [ 399.006801] kswapd0/116 is trying to acquire lock: > > > > [ 399.006810] ffff9a67a1284a28 (&q- > > > > >q_usage_counter(io)){++++}- > > > > {0:0}, > > > > at: __submit_bio+0xf0/0x1c0 > > > > [ 399.006845] > > > > but task is already holding lock: > > > > [ 399.006856] ffffffff8a65bf00 (fs_reclaim){+.+.}-{0:0}, at: > > > > balance_pgdat+0xe2/0xa20 > > > > [ 399.006874] > > > > > > The above one is solved in for-6.14/block of block tree: > > > > > > block: track queue dying state automatically for > > > modeling > > > queue freeze lockdep > > > > Hmm. I applied this series: > > > > https://patchwork.kernel.org/project/linux-block/list/?series=912824&archive=both > > > > on top of -rc6, but it didn't resolve that splat. Am I using the > > correct patches? > > > > Perhaps it might be a good idea to reclaim-prime those lockdep maps > > taken during reclaim to have the splats happen earlier. > > for-6.14/block does kill the dependency between fs_reclaim and > q->q_usage_counter(io) in scsi_add_lun() when scsi disk isn't > added yet. > > Maybe it is another warning, care to post the warning log here? Ah, You're right, it's a different warning this time. Posted the warning below. (Note: This is also with Christoph's series applied on top). May I also humbly suggest the following lockdep priming to be able to catch the reclaim lockdep splats early without reclaim needing to happen. That will also pick up splat #2 below. 8<------------------------------------------------------------- diff --git a/block/blk-core.c b/block/blk-core.c index 32fb28a6372c..2dd8dc9aed7f 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -458,6 +458,11 @@ struct request_queue *blk_alloc_queue(struct queue_limits *lim, int node_id) q->nr_requests = BLKDEV_DEFAULT_RQ; + fs_reclaim_acquire(GFP_KERNEL); + rwsem_acquire_read(&q->io_lockdep_map, 0, 0, _RET_IP_); + rwsem_release(&q->io_lockdep_map, _RET_IP_); + fs_reclaim_release(GFP_KERNEL); + return q; fail_stats: 8<------------------------------------------------------------- #1: 106.921533] ====================================================== [ 106.921716] WARNING: possible circular locking dependency detected [ 106.921725] 6.13.0-rc6+ #121 Tainted: G U [ 106.921734] ------------------------------------------------------ [ 106.921743] kswapd0/117 is trying to acquire lock: [ 106.921751] ffff8ff4e2da09f0 (&q->q_usage_counter(io)){++++}-{0:0}, at: __submit_bio+0x80/0x220 [ 106.921769] but task is already holding lock: [ 106.921778] ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xe2/0xa10 [ 106.921791] which lock already depends on the new lock. [ 106.921803] the existing dependency chain (in reverse order) is: [ 106.921814] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 106.921824] fs_reclaim_acquire+0x9d/0xd0 [ 106.921833] __kmalloc_cache_node_noprof+0x5d/0x3f0 [ 106.921842] blk_mq_init_tags+0x3d/0xb0 [ 106.921851] blk_mq_alloc_map_and_rqs+0x4e/0x3d0 [ 106.921860] blk_mq_init_sched+0x100/0x260 [ 106.921868] elevator_switch+0x8d/0x2e0 [ 106.921877] elv_iosched_store+0x174/0x1e0 [ 106.921885] queue_attr_store+0x142/0x180 [ 106.921893] kernfs_fop_write_iter+0x168/0x240 [ 106.921902] vfs_write+0x2b2/0x540 [ 106.921910] ksys_write+0x72/0xf0 [ 106.921916] do_syscall_64+0x95/0x180 [ 106.921925] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 106.921935] -> #0 (&q->q_usage_counter(io)){++++}-{0:0}: [ 106.921946] __lock_acquire+0x1339/0x2180 [ 106.921955] lock_acquire+0xd0/0x2e0 [ 106.921963] blk_mq_submit_bio+0x88b/0xb60 [ 106.921972] __submit_bio+0x80/0x220 [ 106.921980] submit_bio_noacct_nocheck+0x324/0x420 [ 106.921989] swap_writepage+0x399/0x580 [ 106.921997] pageout+0x129/0x2d0 [ 106.922005] shrink_folio_list+0x5a0/0xd80 [ 106.922013] evict_folios+0x27d/0x7b0 [ 106.922020] try_to_shrink_lruvec+0x21b/0x2b0 [ 106.922028] shrink_one+0x102/0x1f0 [ 106.922035] shrink_node+0xb8e/0x1300 [ 106.922043] balance_pgdat+0x550/0xa10 [ 106.922050] kswapd+0x20a/0x440 [ 106.922057] kthread+0xd2/0x100 [ 106.922064] ret_from_fork+0x31/0x50 [ 106.922072] ret_from_fork_asm+0x1a/0x30 [ 106.922080] other info that might help us debug this: [ 106.922092] Possible unsafe locking scenario: [ 106.922101] CPU0 CPU1 [ 106.922108] ---- ---- [ 106.922115] lock(fs_reclaim); [ 106.922121] lock(&q- >q_usage_counter(io)); [ 106.922132] lock(fs_reclaim); [ 106.922141] rlock(&q->q_usage_counter(io)); [ 106.922148] *** DEADLOCK *** [ 106.922476] 1 lock held by kswapd0/117: [ 106.922802] #0: ffffffff8e65e1c0 (fs_reclaim){+.+.}-{0:0}, at: balance_pgdat+0xe2/0xa10 [ 106.923138] stack backtrace: [ 106.923806] CPU: 3 UID: 0 PID: 117 Comm: kswapd0 Tainted: G U 6.13.0-rc6+ #121 [ 106.924173] Tainted: [U]=USER [ 106.924523] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [ 106.924882] Call Trace: [ 106.925223] <TASK> [ 106.925559] dump_stack_lvl+0x6e/0xa0 [ 106.925893] print_circular_bug.cold+0x178/0x1be [ 106.926233] check_noncircular+0x148/0x160 [ 106.926565] ? unwind_next_frame+0x42a/0x750 [ 106.926905] __lock_acquire+0x1339/0x2180 [ 106.927227] lock_acquire+0xd0/0x2e0 [ 106.927546] ? __submit_bio+0x80/0x220 [ 106.927892] ? blk_mq_submit_bio+0x860/0xb60 [ 106.928212] ? lock_release+0xd2/0x2a0 [ 106.928536] blk_mq_submit_bio+0x88b/0xb60 [ 106.928850] ? __submit_bio+0x80/0x220 [ 106.929184] __submit_bio+0x80/0x220 [ 106.929499] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 106.929833] ? submit_bio_noacct_nocheck+0x324/0x420 [ 106.930147] submit_bio_noacct_nocheck+0x324/0x420 [ 106.930464] swap_writepage+0x399/0x580 [ 106.930794] pageout+0x129/0x2d0 [ 106.931114] shrink_folio_list+0x5a0/0xd80 [ 106.931447] ? evict_folios+0x25d/0x7b0 [ 106.931776] evict_folios+0x27d/0x7b0 [ 106.932092] try_to_shrink_lruvec+0x21b/0x2b0 [ 106.932410] shrink_one+0x102/0x1f0 [ 106.932742] shrink_node+0xb8e/0x1300 [ 106.933056] ? shrink_node+0x9c1/0x1300 [ 106.933368] ? shrink_node+0xb64/0x1300 [ 106.933679] ? balance_pgdat+0x550/0xa10 [ 106.933988] balance_pgdat+0x550/0xa10 [ 106.934296] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 106.934607] ? finish_task_switch.isra.0+0xc4/0x2a0 [ 106.934920] kswapd+0x20a/0x440 [ 106.935229] ? __pfx_autoremove_wake_function+0x10/0x10 [ 106.935542] ? __pfx_kswapd+0x10/0x10 [ 106.935881] kthread+0xd2/0x100 [ 106.936191] ? __pfx_kthread+0x10/0x10 [ 106.936501] ret_from_fork+0x31/0x50 [ 106.936810] ? __pfx_kthread+0x10/0x10 [ 106.937120] ret_from_fork_asm+0x1a/0x30 [ 106.937433] </TASK> #2: [ 5.595482] ====================================================== [ 5.596353] WARNING: possible circular locking dependency detected [ 5.597231] 6.13.0-rc6+ #122 Tainted: G U [ 5.598182] ------------------------------------------------------ [ 5.599149] (udev-worker)/867 is trying to acquire lock: [ 5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4}, at: kernfs_remove+0x31/0x50 [ 5.600987] but task is already holding lock: [ 5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}- {0:0}, at: blk_mq_freeze_queue+0x12/0x20 [ 5.603033] which lock already depends on the new lock. [ 5.603034] the existing dependency chain (in reverse order) is: [ 5.603035] -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}: [ 5.603038] blk_alloc_queue+0x319/0x350 [ 5.603041] blk_mq_alloc_queue+0x63/0xd0 [ 5.603043] scsi_alloc_sdev+0x281/0x3c0 [ 5.603045] scsi_probe_and_add_lun+0x1f5/0x450 [ 5.603046] __scsi_scan_target+0x112/0x230 [ 5.603048] scsi_scan_channel+0x59/0x90 [ 5.603049] scsi_scan_host_selected+0xe5/0x120 [ 5.603051] do_scan_async+0x1b/0x160 [ 5.603052] async_run_entry_fn+0x31/0x130 [ 5.603055] process_one_work+0x21a/0x590 [ 5.603058] worker_thread+0x1c3/0x3b0 [ 5.603059] kthread+0xd2/0x100 [ 5.603061] ret_from_fork+0x31/0x50 [ 5.603064] ret_from_fork_asm+0x1a/0x30 [ 5.603066] -> #1 (fs_reclaim){+.+.}-{0:0}: [ 5.603068] fs_reclaim_acquire+0x9d/0xd0 [ 5.603070] kmem_cache_alloc_lru_noprof+0x57/0x3f0 [ 5.603072] alloc_inode+0x97/0xc0 [ 5.603074] iget_locked+0x141/0x310 [ 5.603076] kernfs_get_inode+0x1a/0xf0 [ 5.603077] kernfs_get_tree+0x17b/0x2c0 [ 5.603080] sysfs_get_tree+0x1a/0x40 [ 5.603081] vfs_get_tree+0x29/0xe0 [ 5.603083] path_mount+0x49a/0xbd0 [ 5.603085] __x64_sys_mount+0x119/0x150 [ 5.603086] do_syscall_64+0x95/0x180 [ 5.603089] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 5.603092] -> #0 (&root->kernfs_rwsem){++++}-{4:4}: [ 5.603094] __lock_acquire+0x1339/0x2180 [ 5.603097] lock_acquire+0xd0/0x2e0 [ 5.603099] down_write+0x2e/0xb0 [ 5.603101] kernfs_remove+0x31/0x50 [ 5.603103] __kobject_del+0x2e/0x90 [ 5.603104] kobject_del+0x13/0x30 [ 5.603104] elevator_switch+0x44/0x2e0 [ 5.603106] elv_iosched_store+0x174/0x1e0 [ 5.603107] queue_attr_store+0x142/0x180 [ 5.603108] kernfs_fop_write_iter+0x168/0x240 [ 5.603110] vfs_write+0x2b2/0x540 [ 5.603111] ksys_write+0x72/0xf0 [ 5.603111] do_syscall_64+0x95/0x180 [ 5.603113] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 5.603114] other info that might help us debug this: [ 5.603115] Chain exists of: &root->kernfs_rwsem --> fs_reclaim --> &q- >q_usage_counter(io)#3 [ 5.603117] Possible unsafe locking scenario: [ 5.603117] CPU0 CPU1 [ 5.603117] ---- ---- [ 5.603118] lock(&q->q_usage_counter(io)#3); [ 5.603119] lock(fs_reclaim); [ 5.603119] lock(&q- >q_usage_counter(io)#3); [ 5.603120] lock(&root->kernfs_rwsem); [ 5.603121] *** DEADLOCK *** [ 5.603121] 6 locks held by (udev-worker)/867: [ 5.603122] #0: ffff9211c16dd420 (sb_writers#4){.+.+}-{0:0}, at: ksys_write+0x72/0xf0 [ 5.603125] #1: ffff9211e28f3e88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x121/0x240 [ 5.603128] #2: ffff921203524f28 (kn->active#101){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x12a/0x240 [ 5.603131] #3: ffff9211e86f46d0 (&q->sysfs_lock){+.+.}-{4:4}, at: queue_attr_store+0x12b/0x180 [ 5.603133] #4: ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}- {0:0}, at: blk_mq_freeze_queue+0x12/0x20 [ 5.603136] #5: ffff9211e86f41d8 (&q- >q_usage_counter(queue)#3){++++}-{0:0}, at: blk_mq_freeze_queue+0x12/0x20 [ 5.603139] stack backtrace: [ 5.603140] CPU: 4 UID: 0 PID: 867 Comm: (udev-worker) Tainted: G U 6.13.0-rc6+ #122 [ 5.603142] Tainted: [U]=USER [ 5.603142] Hardware name: ASUS System Product Name/PRIME B560M-A AC, BIOS 2001 02/01/2023 [ 5.603143] Call Trace: [ 5.603144] <TASK> [ 5.603146] dump_stack_lvl+0x6e/0xa0 [ 5.603148] print_circular_bug.cold+0x178/0x1be [ 5.603151] check_noncircular+0x148/0x160 [ 5.603154] __lock_acquire+0x1339/0x2180 [ 5.603156] lock_acquire+0xd0/0x2e0 [ 5.603158] ? kernfs_remove+0x31/0x50 [ 5.603160] ? sysfs_remove_dir+0x32/0x60 [ 5.603162] ? lock_release+0xd2/0x2a0 [ 5.603164] down_write+0x2e/0xb0 [ 5.603165] ? kernfs_remove+0x31/0x50 [ 5.603166] kernfs_remove+0x31/0x50 [ 5.603168] __kobject_del+0x2e/0x90 [ 5.603170] elevator_switch+0x44/0x2e0 [ 5.603172] elv_iosched_store+0x174/0x1e0 [ 5.603174] queue_attr_store+0x142/0x180 [ 5.603176] ? lock_acquire+0xd0/0x2e0 [ 5.603177] ? kernfs_fop_write_iter+0x12a/0x240 [ 5.603179] ? lock_is_held_type+0x9a/0x110 [ 5.603182] kernfs_fop_write_iter+0x168/0x240 [ 5.657060] vfs_write+0x2b2/0x540 [ 5.657470] ksys_write+0x72/0xf0 [ 5.657475] do_syscall_64+0x95/0x180 [ 5.657480] ? lock_acquire+0xd0/0x2e0 [ 5.657484] ? ktime_get_coarse_real_ts64+0x12/0x60 [ 5.657486] ? find_held_lock+0x2b/0x80 [ 5.657489] ? ktime_get_coarse_real_ts64+0x12/0x60 [ 5.657490] ? file_has_perm+0xa9/0xf0 [ 5.657494] ? syscall_exit_to_user_mode_prepare+0x21b/0x250 [ 5.657499] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 5.657501] ? syscall_exit_to_user_mode+0x97/0x290 [ 5.657504] ? do_syscall_64+0xa1/0x180 [ 5.657507] ? lock_acquire+0xd0/0x2e0 [ 5.662389] ? fd_install+0x3e/0x300 [ 5.662395] ? find_held_lock+0x2b/0x80 [ 5.663189] ? fd_install+0xbb/0x300 [ 5.663194] ? do_sys_openat2+0x9c/0xe0 [ 5.664093] ? kmem_cache_free+0x13e/0x450 [ 5.664099] ? syscall_exit_to_user_mode_prepare+0x21b/0x250 [ 5.664952] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 5.664956] ? syscall_exit_to_user_mode+0x97/0x290 [ 5.664961] ? do_syscall_64+0xa1/0x180 [ 5.664964] ? syscall_exit_to_user_mode_prepare+0x21b/0x250 [ 5.664967] ? lockdep_hardirqs_on_prepare+0xdb/0x190 [ 5.664969] ? syscall_exit_to_user_mode+0x97/0x290 [ 5.664972] ? do_syscall_64+0xa1/0x180 [ 5.664974] ? clear_bhb_loop+0x45/0xa0 [ 5.664977] ? clear_bhb_loop+0x45/0xa0 [ 5.664979] ? clear_bhb_loop+0x45/0xa0 [ 5.664982] entry_SYSCALL_64_after_hwframe+0x76/0x7e [ 5.664985] RIP: 0033:0x7fe72d2f4484 [ 5.664988] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 f3 0f 1e fa 80 3d 45 9c 10 00 00 74 13 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89 [ 5.664990] RSP: 002b:00007ffe51665998 EFLAGS: 00000202 ORIG_RAX: 0000000000000001 [ 5.664992] RAX: ffffffffffffffda RBX: 0000000000000003 RCX: 00007fe72d2f4484 [ 5.664994] RDX: 0000000000000003 RSI: 00007ffe51665ca0 RDI: 0000000000000038 [ 5.664995] RBP: 00007ffe516659c0 R08: 00007fe72d3f51c8 R09: 00007ffe51665a70 [ 5.664996] R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000003 [ 5.664997] R13: 00007ffe51665ca0 R14: 000055a1bab093b0 R15: 00007fe72d3f4e80 [ 5.665001] </TASK> Thanks, Thomas