On Mon, Jan 13, 2025 at 10:58:07AM +0100, Thomas Hellström wrote: > Hi, > > On Mon, 2025-01-13 at 17:28 +0800, Ming Lei wrote: > > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote: > > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote: > > > > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote: > > > > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote: > > > > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström > > > > > > wrote: > > > > > > > Ming, Others > > > > > > > > > > > > > #2: > > > [ 5.595482] > > > ====================================================== > > > [ 5.596353] WARNING: possible circular locking dependency > > > detected > > > [ 5.597231] 6.13.0-rc6+ #122 Tainted: G U > > > [ 5.598182] ---------------------------------------------------- > > > -- > > > [ 5.599149] (udev-worker)/867 is trying to acquire lock: > > > [ 5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4}, > > > at: > > > kernfs_remove+0x31/0x50 > > > [ 5.600987] > > > but task is already holding lock: > > > [ 5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}- > > > {0:0}, at: blk_mq_freeze_queue+0x12/0x20 > > > [ 5.603033] > > > which lock already depends on the new lock. > > > > > > [ 5.603034] > > > the existing dependency chain (in reverse order) is: > > > [ 5.603035] > > > -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}: > > > [ 5.603038] blk_alloc_queue+0x319/0x350 > > > [ 5.603041] blk_mq_alloc_queue+0x63/0xd0 > > > > The above one is solved in for-6.14/block of block tree: > > > > block: track queue dying state automatically for modeling > > queue freeze lockdep > > > > q->q_usage_counter(io) is killed because disk isn't up yet. > > > > If you apply the noio patch against for-6.1/block, the two splats > > should > > have disappeared. If not, please post lockdep log. > > That above dependency path is the lockdep priming I suggested, which > establishes the reclaim -> q->q_usage_counter(io) locking order. > A splat without that priming would look slightly different and won't > occur until memory is actually exhausted. But it *will* occur. > > That's why I suggested using the priming to catch all fs_reclaim- > >q_usage_counter(io) violations early, perhaps already at system boot, > and anybody accidently adding a GFP_KERNEL memory allocation under the > q_usage_counter(io) lock would get a notification as soon as that > allocation happens. > > The actual deadlock sequence is because kernfs_rwsem is taken under > q_usage_counter(io): (excerpt from the report [a]). > If the priming is removed, the splat doesn't happen until reclaim, and > will instead look like [b]. Got it, [b] is new warning between 'echo /sys/block/$DEV/queue/scheduler' and fs reclaim from sysfs inode allocation. Three global or sub-system locks are involved: - fs_reclaim - root->kernfs_rwsem - q->queue_usage_counter(io) The problem exists since blk-mq scheduler is introduced, looks one hard problem because it becomes difficult to avoid their dependency now. I will think about and see if we can figure out one solution. Thanks, Ming