Re: Blockdev 6.13-rc lockdep splat regressions

Ming Lei <ming.lei@xxxxxxxxxx> · Mon, 13 Jan 2025 18:40:52 +0800

On Mon, Jan 13, 2025 at 10:58:07AM +0100, Thomas Hellström wrote:
> Hi,
> 
> On Mon, 2025-01-13 at 17:28 +0800, Ming Lei wrote:
> > On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> > > > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> > > > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > > > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström
> > > > > > wrote:
> > > > > > > Ming, Others
> > > > > > > 
> > > 
> > > #2:
> > > [    5.595482]
> > > ======================================================
> > > [    5.596353] WARNING: possible circular locking dependency
> > > detected
> > > [    5.597231] 6.13.0-rc6+ #122 Tainted: G     U            
> > > [    5.598182] ----------------------------------------------------
> > > --
> > > [    5.599149] (udev-worker)/867 is trying to acquire lock:
> > > [    5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4},
> > > at:
> > > kernfs_remove+0x31/0x50
> > > [    5.600987] 
> > >                but task is already holding lock:
> > > [    5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
> > > {0:0}, at: blk_mq_freeze_queue+0x12/0x20
> > > [    5.603033] 
> > >                which lock already depends on the new lock.
> > > 
> > > [    5.603034] 
> > >                the existing dependency chain (in reverse order) is:
> > > [    5.603035] 
> > >                -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}:
> > > [    5.603038]        blk_alloc_queue+0x319/0x350
> > > [    5.603041]        blk_mq_alloc_queue+0x63/0xd0
> > 
> > The above one is solved in for-6.14/block of block tree:
> > 
> > 	block: track queue dying state automatically for modeling
> > queue freeze lockdep
> > 
> > q->q_usage_counter(io) is killed because disk isn't up yet.
> > 
> > If you apply the noio patch against for-6.1/block, the two splats
> > should
> > have disappeared. If not, please post lockdep log.
> 
> That above dependency path is the lockdep priming I suggested, which
> establishes the reclaim -> q->q_usage_counter(io) locking order. 
> A splat without that priming would look slightly different and won't
> occur until memory is actually exhausted. But it *will* occur.
> 
> That's why I suggested using the priming to catch all fs_reclaim-
> >q_usage_counter(io) violations early, perhaps already at system boot,
> and anybody accidently adding a GFP_KERNEL memory allocation under the
> q_usage_counter(io) lock would get a notification as soon as that
> allocation happens.
> 
> The actual deadlock sequence is because kernfs_rwsem is taken under
> q_usage_counter(io): (excerpt from the report [a]). 
> If the priming is removed, the splat doesn't happen until reclaim, and
> will instead look like [b].

Got it, [b] is new warning between 'echo /sys/block/$DEV/queue/scheduler'
and fs reclaim from sysfs inode allocation.

Three global or sub-system locks are involved:

- fs_reclaim

- root->kernfs_rwsem

- q->queue_usage_counter(io)

The problem exists since blk-mq scheduler is introduced, looks one hard
problem because it becomes difficult to avoid their dependency now.

I will think about and see if we can figure out one solution.

Thanks, 
Ming