Re: Blockdev 6.13-rc lockdep splat regressions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On Mon, 2025-01-13 at 17:28 +0800, Ming Lei wrote:
> On Sun, Jan 12, 2025 at 12:33:13PM +0100, Thomas Hellström wrote:
> > On Sat, 2025-01-11 at 11:05 +0800, Ming Lei wrote:
> > > On Fri, Jan 10, 2025 at 03:36:44PM +0100, Thomas Hellström wrote:
> > > > On Fri, 2025-01-10 at 20:13 +0800, Ming Lei wrote:
> > > > > On Fri, Jan 10, 2025 at 11:12:58AM +0100, Thomas Hellström
> > > > > wrote:
> > > > > > Ming, Others
> > > > > > 
> > 
> > #2:
> > [    5.595482]
> > ======================================================
> > [    5.596353] WARNING: possible circular locking dependency
> > detected
> > [    5.597231] 6.13.0-rc6+ #122 Tainted: G     U            
> > [    5.598182] ----------------------------------------------------
> > --
> > [    5.599149] (udev-worker)/867 is trying to acquire lock:
> > [    5.600075] ffff9211c02f7948 (&root->kernfs_rwsem){++++}-{4:4},
> > at:
> > kernfs_remove+0x31/0x50
> > [    5.600987] 
> >                but task is already holding lock:
> > [    5.603025] ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
> > {0:0}, at: blk_mq_freeze_queue+0x12/0x20
> > [    5.603033] 
> >                which lock already depends on the new lock.
> > 
> > [    5.603034] 
> >                the existing dependency chain (in reverse order) is:
> > [    5.603035] 
> >                -> #2 (&q->q_usage_counter(io)#3){++++}-{0:0}:
> > [    5.603038]        blk_alloc_queue+0x319/0x350
> > [    5.603041]        blk_mq_alloc_queue+0x63/0xd0
> 
> The above one is solved in for-6.14/block of block tree:
> 
> 	block: track queue dying state automatically for modeling
> queue freeze lockdep
> 
> q->q_usage_counter(io) is killed because disk isn't up yet.
> 
> If you apply the noio patch against for-6.1/block, the two splats
> should
> have disappeared. If not, please post lockdep log.

That above dependency path is the lockdep priming I suggested, which
establishes the reclaim -> q->q_usage_counter(io) locking order. 
A splat without that priming would look slightly different and won't
occur until memory is actually exhausted. But it *will* occur.

That's why I suggested using the priming to catch all fs_reclaim-
>q_usage_counter(io) violations early, perhaps already at system boot,
and anybody accidently adding a GFP_KERNEL memory allocation under the
q_usage_counter(io) lock would get a notification as soon as that
allocation happens.

The actual deadlock sequence is because kernfs_rwsem is taken under
q_usage_counter(io): (excerpt from the report [a]). 
If the priming is removed, the splat doesn't happen until reclaim, and
will instead look like [b].

Thanks,
Thomas


[a]
[    5.603115] Chain exists of:
                 &root->kernfs_rwsem --> fs_reclaim --> &q-
>q_usage_counter(io)#3

[    5.603117]  Possible unsafe locking scenario:

[    5.603117]        CPU0                    CPU1
[    5.603117]        ----                    ----
[    5.603118]   lock(&q->q_usage_counter(io)#3);
[    5.603119]                                lock(fs_reclaim);
[    5.603119]                                lock(&q-
>q_usage_counter(io)#3);
[    5.603120]   lock(&root->kernfs_rwsem);
[    5.603121]
                *** DEADLOCK ***

[    5.603121] 6 locks held by (udev-worker)/867:
[    5.603122]  #0: ffff9211c16dd420 (sb_writers#4){.+.+}-{0:0}, at:
ksys_write+0x72/0xf0
[    5.603125]  #1: ffff9211e28f3e88 (&of->mutex#2){+.+.}-{4:4}, at:
kernfs_fop_write_iter+0x121/0x240
[    5.603128]  #2: ffff921203524f28 (kn->active#101){.+.+}-{0:0}, at:
kernfs_fop_write_iter+0x12a/0x240
[    5.603131]  #3: ffff9211e86f46d0 (&q->sysfs_lock){+.+.}-{4:4}, at:
queue_attr_store+0x12b/0x180
[    5.603133]  #4: ffff9211e86f41a0 (&q->q_usage_counter(io)#3){++++}-
{0:0}, at: blk_mq_freeze_queue+0x12/0x20
[    5.603136]  #5: ffff9211e86f41d8 (&q-
>q_usage_counter(queue)#3){++++}-{0:0}, at:
blk_mq_freeze_queue+0x12/0x20
[    5.603139]
               stack backtrace:
[    5.603140] CPU: 4 UID: 0 PID: 867 Comm: (udev-worker) Tainted: G 
U             6.13.0-rc6+ #122
[    5.603142] Tainted: [U]=USER
[    5.603142] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[    5.603143] Call Trace:
[    5.603144]  <TASK>
[    5.603146]  dump_stack_lvl+0x6e/0xa0
[    5.603148]  print_circular_bug.cold+0x178/0x1be
[    5.603151]  check_noncircular+0x148/0x160
[    5.603154]  __lock_acquire+0x1339/0x2180
[    5.603156]  lock_acquire+0xd0/0x2e0
[    5.603158]  ? kernfs_remove+0x31/0x50
[    5.603160]  ? sysfs_remove_dir+0x32/0x60
[    5.603162]  ? lock_release+0xd2/0x2a0
[    5.603164]  down_write+0x2e/0xb0
[    5.603165]  ? kernfs_remove+0x31/0x50
[    5.603166]  kernfs_remove+0x31/0x50
[    5.

[b]

[157.543591] ======================================================
[  157.543778] WARNING: possible circular locking dependency detected
[  157.543787] 6.13.0-rc6+ #123 Tainted: G     U            
[  157.543796] ------------------------------------------------------
[  157.543805] git/2856 is trying to acquire lock:
[  157.543812] ffff98b6bb882f10 (&q->q_usage_counter(io)#2){++++}-
{0:0}, at: __submit_bio+0x80/0x220
[  157.543830] 
               but task is already holding lock:
[  157.543839] ffffffffad65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
__alloc_pages_slowpath.constprop.0+0x348/0xea0
[  157.543855] 
               which lock already depends on the new lock.

[  157.543867] 
               the existing dependency chain (in reverse order) is:
[  157.543878] 
               -> #2 (fs_reclaim){+.+.}-{0:0}:
[  157.543888]        fs_reclaim_acquire+0x9d/0xd0
[  157.543896]        kmem_cache_alloc_lru_noprof+0x57/0x3f0
[  157.543906]        alloc_inode+0x97/0xc0
[  157.543913]        iget_locked+0x141/0x310
[  157.543921]        kernfs_get_inode+0x1a/0xf0
[  157.543929]        kernfs_get_tree+0x17b/0x2c0
[  157.543938]        sysfs_get_tree+0x1a/0x40
[  157.543945]        vfs_get_tree+0x29/0xe0
[  157.543953]        path_mount+0x49a/0xbd0
[  157.543960]        __x64_sys_mount+0x119/0x150
[  157.543968]        do_syscall_64+0x95/0x180
[  157.543977]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  157.543986] 
               -> #1 (&root->kernfs_rwsem){++++}-{4:4}:
[  157.543997]        down_write+0x2e/0xb0
[  157.544004]        kernfs_remove+0x31/0x50
[  157.544012]        __kobject_del+0x2e/0x90
[  157.544020]        kobject_del+0x13/0x30
[  157.544026]        elevator_switch+0x44/0x2e0
[  157.544034]        elv_iosched_store+0x174/0x1e0
[  157.544043]        queue_attr_store+0x165/0x1b0
[  157.544050]        kernfs_fop_write_iter+0x168/0x240
[  157.544059]        vfs_write+0x2b2/0x540
[  157.544066]        ksys_write+0x72/0xf0
[  157.544073]        do_syscall_64+0x95/0x180
[  157.544081]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  157.544090] 
               -> #0 (&q->q_usage_counter(io)#2){++++}-{0:0}:
[  157.544102]        __lock_acquire+0x1339/0x2180
[  157.544110]        lock_acquire+0xd0/0x2e0
[  157.544118]        blk_mq_submit_bio+0x88b/0xb60
[  157.544127]        __submit_bio+0x80/0x220
[  157.544135]        submit_bio_noacct_nocheck+0x324/0x420
[  157.544144]        swap_writepage+0x399/0x580
[  157.544152]        pageout+0x129/0x2d0
[  157.544160]        shrink_folio_list+0x5a0/0xd80
[  157.544168]        evict_folios+0x27d/0x7b0
[  157.544175]        try_to_shrink_lruvec+0x21b/0x2b0
[  157.544183]        shrink_one+0x102/0x1f0
[  157.544191]        shrink_node+0xb8e/0x1300
[  157.544198]        do_try_to_free_pages+0xb3/0x580
[  157.544206]        try_to_free_pages+0xfa/0x2a0
[  157.544214]        __alloc_pages_slowpath.constprop.0+0x36f/0xea0
[  157.544224]        __alloc_pages_noprof+0x34c/0x390
[  157.544233]        alloc_pages_mpol_noprof+0xd7/0x1c0
[  157.544241]        pipe_write+0x3fc/0x7f0
[  157.544574]        vfs_write+0x401/0x540
[  157.544917]        ksys_write+0xd1/0xf0
[  157.545246]        do_syscall_64+0x95/0x180
[  157.545576]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  157.545909] 
               other info that might help us debug this:

[  157.546879] Chain exists of:
                 &q->q_usage_counter(io)#2 --> &root->kernfs_rwsem -->
fs_reclaim

[  157.547849]  Possible unsafe locking scenario:

[  157.548483]        CPU0                    CPU1
[  157.548795]        ----                    ----
[  157.549098]   lock(fs_reclaim);
[  157.549400]                                lock(&root-
>kernfs_rwsem);
[  157.549705]                                lock(fs_reclaim);
[  157.550011]   rlock(&q->q_usage_counter(io)#2);
[  157.550316] 
                *** DEADLOCK ***

[  157.551194] 2 locks held by git/2856:
[  157.551490]  #0: ffff98b6a221e068 (&pipe->mutex){+.+.}-{4:4}, at:
pipe_write+0x5a/0x7f0
[  157.551798]  #1: ffffffffad65e1c0 (fs_reclaim){+.+.}-{0:0}, at:
__alloc_pages_slowpath.constprop.0+0x348/0xea0
[  157.552115] 
               stack backtrace:
[  157.552734] CPU: 5 UID: 1000 PID: 2856 Comm: git Tainted: G     U  
6.13.0-rc6+ #123
[  157.553060] Tainted: [U]=USER
[  157.553383] Hardware name: ASUS System Product Name/PRIME B560M-A
AC, BIOS 2001 02/01/2023
[  157.553718] Call Trace:
[  157.554054]  <TASK>
[  157.554389]  dump_stack_lvl+0x6e/0xa0
[  157.554725]  print_circular_bug.cold+0x178/0x1be
[  157.555064]  check_noncircular+0x148/0x160
[  157.555408]  ? __pfx_stack_trace_consume_entry+0x10/0x10
[  157.555747]  ? unwind_get_return_address+0x23/0x40
[  157.556085]  __lock_acquire+0x1339/0x2180
[  157.556425]  lock_acquire+0xd0/0x2e0
[  157.556761]  ? __submit_bio+0x80/0x220
[  157.557110]  ? blk_mq_submit_bio+0x860/0xb60
[  157.557447]  ? lock_release+0xd2/0x2a0
[  157.557784]  blk_mq_submit_bio+0x88b/0xb60
[  157.558137]  ? __submit_bio+0x80/0x220
[  157.558476]  __submit_bio+0x80/0x220
[  157.558828]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[  157.559166]  ? submit_bio_noacct_nocheck+0x324/0x420
[  157.559504]  submit_bio_noacct_nocheck+0x324/0x420
[  157.559863]  swap_writepage+0x399/0x580
[  157.560205]  pageout+0x129/0x2d0
[  157.560542]  shrink_folio_list+0x5a0/0xd80
[  157.560879]  ? evict_folios+0x25d/0x7b0
[  157.561212]  evict_folios+0x27d/0x7b0
[  157.561546]  try_to_shrink_lruvec+0x21b/0x2b0
[  157.561890]  shrink_one+0x102/0x1f0
[  157.562222]  shrink_node+0xb8e/0x1300
[  157.562554]  ? shrink_node+0x9c1/0x1300
[  157.562915]  ? shrink_node+0xb64/0x1300
[  157.563245]  ? do_try_to_free_pages+0xb3/0x580
[  157.563576]  do_try_to_free_pages+0xb3/0x580
[  157.563922]  ? lock_release+0xd2/0x2a0
[  157.564252]  try_to_free_pages+0xfa/0x2a0
[  157.564583]  __alloc_pages_slowpath.constprop.0+0x36f/0xea0
[  157.564946]  ? lock_release+0xd2/0x2a0
[  157.565279]  __alloc_pages_noprof+0x34c/0x390
[  157.565613]  alloc_pages_mpol_noprof+0xd7/0x1c0
[  157.565952]  pipe_write+0x3fc/0x7f0
[  157.566283]  vfs_write+0x401/0x540
[  157.566615]  ksys_write+0xd1/0xf0
[  157.566980]  do_syscall_64+0x95/0x180
[  157.567312]  ? vfs_write+0x401/0x540
[  157.567642]  ? lockdep_hardirqs_on_prepare+0xdb/0x190
[  157.568001]  ? syscall_exit_to_user_mode+0x97/0x290
[  157.568331]  ? do_syscall_64+0xa1/0x180
[  157.568658]  ? do_syscall_64+0xa1/0x180
[  157.569012]  ? syscall_exit_to_user_mode+0x97/0x290
[  157.569337]  ? do_syscall_64+0xa1/0x180
[  157.569658]  ? do_user_addr_fault+0x397/0x720
[  157.569980]  ? trace_hardirqs_off+0x4b/0xc0
[  157.570300]  ? clear_bhb_loop+0x45/0xa0
[  157.570621]  ? clear_bhb_loop+0x45/0xa0
[  157.570968]  ? clear_bhb_loop+0x45/0xa0
[  157.571286]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  157.571605] RIP: 0033:0x7fdf1ec2d484
[  157.571966] Code: c7 00 16 00 00 00 b8 ff ff ff ff c3 66 2e 0f 1f 84
00 00 00 00 00 f3 0f 1e fa 80 3d 45 9c 10 00 00 74 13 b8 01 00 00 00 0f
05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 55 48 89 e5 48 83 ec 20 48 89
[  157.572322] RSP: 002b:00007ffd0eb6d068 EFLAGS: 00000202 ORIG_RAX:
0000000000000001
[  157.572692] RAX: ffffffffffffffda RBX: 0000000000000331 RCX:
00007fdf1ec2d484
[  157.573093] RDX: 0000000000000331 RSI: 000055693fe2d660 RDI:
0000000000000001
[  157.573470] RBP: 00007ffd0eb6d090 R08: 000055693fdc6010 R09:
0000000000000007
[  157.573875] R10: 0000556941b97c70 R11: 0000000000000202 R12:
0000000000000331
[  157.574249] R13: 000055693fe2d660 R14: 00007fdf1ed305c0 R15:
00007fdf1ed2de80
[  157.574621]  </TASK>

> 
> Thanks,
> Ming
> 






[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux