Re: [PATCH 7/7] block: don't grab q->debugfs_mutex

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Feb 10, 2025 / 12:31, Nilay Shroff wrote:
> 
> 
> On 2/10/25 7:12 AM, Ming Lei wrote:
> > On Mon, Feb 10, 2025 at 8:52 AM Shinichiro Kawasaki
> > <shinichiro.kawasaki@xxxxxxx> wrote:
> >>
> >> On Feb 09, 2025 / 20:20, Ming Lei wrote:
> >>> All block internal state for dealing adding/removing debugfs entries
> >>> have been removed, and debugfs can sync everything for us in fs level,
> >>> so don't grab q->debugfs_mutex for adding/removing block internal debugfs
> >>> entries.
> >>>
> >>> Now q->debugfs_mutex is only used for blktrace, meantime move creating
> >>> queue debugfs dir code out of q->sysfs_lock. Both the two locks are
> >>> connected with queue freeze IO lock.  Then queue freeze IO lock chain
> >>> with debugfs lock is cut.
> >>>
> >>> The following lockdep report can be fixed:
> >>>
> >>> https://lore.kernel.org/linux-block/ougniadskhks7uyxguxihgeuh2pv4yaqv4q3emo4gwuolgzdt6@brotly74p6bs/
> >>>
> >>> Follows contexts which adds/removes debugfs entries:
> >>>
> >>> - update nr_hw_queues
> >>>
> >>> - add/remove disks
> >>>
> >>> - elevator switch
> >>>
> >>> - blktrace
> >>>
> >>> blktrace only adds entries under disk top directory, so we can ignore it,
> >>> because it can only work iff disk is added. Also nothing overlapped with
> >>> the other two contex, blktrace context is fine.
> >>>
> >>> Elevator switch is only allowed after disk is added, so there isn't race
> >>> with add/remove disk. blk_mq_update_nr_hw_queues() always restores to
> >>> previous elevator, so no race between these two. Elevator switch context
> >>> is fine.
> >>>
> >>> So far blk_mq_update_nr_hw_queues() doesn't hold debugfs lock for
> >>> adding/removing hctx entries, there might be race with add/remove disk,
> >>> which is just fine in reality:
> >>>
> >>> - blk_mq_update_nr_hw_queues() is usually for error recovery, and disk
> >>> won't be added/removed at the same time
> >>>
> >>> - even though there is race between the two contexts, it is just fine,
> >>> since hctx won't be freed until queue is dead
> >>>
> >>> - we never see reports in this area without holding debugfs in
> >>> blk_mq_update_nr_hw_queues()
> >>>
> >>> Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@xxxxxxx>
> >>> Signed-off-by: Ming Lei <ming.lei@xxxxxxxxxx>
> >>
> >> Ming, thank you for this quick action. I applied this series on top of
> >> v6.14-rc1 kernel and ran the block/002 test case. Unfortunately, still if fails
> >> occasionally with the lockdep "WARNING: possible circular locking dependency
> >> detected" below. Now debugfs_mutex is not reported as one of the dependent
> >> locks, then I think this fix is working as expected. Instead, eq->sysfs_lock
> >> creates similar dependency. My mere guess is that this patch avoids one
> >> dependency, but still another dependency is left.
> > 
> > Indeed, this patch cuts dependency on both q->sysfs_lock and q->debugfs_lock,
> Glad to see that with this patch we're able to cut the dependency between
> q->sysfs_lock and q->debugfs_lock.
> 
> > but elevator ->sysfs_lock isn't covered, :-(
> > 
> I believe that shall be fixed with the current effort undergoing here:
> https://lore.kernel.org/all/20250205144506.663819-1-nilay@xxxxxxxxxxxxx/

Thanks Nilay for the information. I applied your patch series on top of
v6.14-rc1 and Ming's. Then I ran block/002. Alas, still the test case fails with
the "WARNING: possible circular locking dependency detected". The lockdep splat
contents has changed as below:


[  222.549099] [   T1041] run blktests block/002 at 2025-02-10 17:13:18
[  222.887999] [   T1072] sd 9:0:0:0: [sdd] Synchronizing SCSI cache
[  223.172864] [   T1073] scsi_debug:sdebug_driver_probe: scsi_debug: trim poll_queues to 0. poll_q/nr_hw = (0/1)
[  223.173872] [   T1073] scsi host9: scsi_debug: version 0191 [20210520]
                            dev_size_mb=8, opts=0x0, submit_queues=1, statistics=0
[  223.177251] [   T1073] scsi 9:0:0:0: Direct-Access     Linux    scsi_debug       0191 PQ: 0 ANSI: 7
[  223.179393] [      C0] scsi 9:0:0:0: Power-on or device reset occurred
[  223.185351] [   T1073] sd 9:0:0:0: Attached scsi generic sg3 type 0
[  223.187514] [    T101] sd 9:0:0:0: [sdd] 16384 512-byte logical blocks: (8.39 MB/8.00 MiB)
[  223.189675] [    T101] sd 9:0:0:0: [sdd] Write Protect is off
[  223.190146] [    T101] sd 9:0:0:0: [sdd] Mode Sense: 73 00 10 08
[  223.192369] [    T101] sd 9:0:0:0: [sdd] Write cache: enabled, read cache: enabled, supports DPO and FUA
[  223.199999] [    T101] sd 9:0:0:0: [sdd] permanent stream count = 5
[  223.203950] [    T101] sd 9:0:0:0: [sdd] Preferred minimum I/O size 512 bytes
[  223.204568] [    T101] sd 9:0:0:0: [sdd] Optimal transfer size 524288 bytes
[  223.245371] [    T101] sd 9:0:0:0: [sdd] Attached SCSI disk

[  223.999340] [   T1097] ======================================================
[  224.000044] [   T1097] WARNING: possible circular locking dependency detected
[  224.000651] [   T1097] 6.14.0-rc1+ #255 Not tainted
[  224.001109] [   T1097] ------------------------------------------------------
[  224.001791] [   T1097] blktrace/1097 is trying to acquire lock:
[  224.002220] [   T1097] ffff88814c53bde0 (&mm->mmap_lock){++++}-{4:4}, at: __might_fault+0x99/0x120
[  224.002887] [   T1097] 
                          but task is already holding lock:
[  224.003423] [   T1097] ffff88810079dc68 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: relay_file_read+0xa3/0x8a0
[  224.004173] [   T1097] 
                          which lock already depends on the new lock.

[  224.004934] [   T1097] 
                          the existing dependency chain (in reverse order) is:
[  224.007356] [   T1097] 
                          -> #6 (&sb->s_type->i_mutex_key#3){++++}-{4:4}:
[  224.009859] [   T1097]        down_write+0x8d/0x200
[  224.011109] [   T1097]        start_creating.part.0+0x82/0x230
[  224.012465] [   T1097]        debugfs_create_dir+0x3a/0x4c0
[  224.013793] [   T1097]        blk_mq_debugfs_register_rqos+0x1a5/0x3a0
[  224.015101] [   T1097]        rq_qos_add+0x2f6/0x450
[  224.016312] [   T1097]        wbt_init+0x359/0x490
[  224.017507] [   T1097]        blk_register_queue+0x1c0/0x330
[  224.018800] [   T1097]        add_disk_fwnode+0x6b1/0x1010
[  224.020047] [   T1097]        sd_probe+0x94e/0xf30
[  224.021235] [   T1097]        really_probe+0x1e3/0x8a0
[  224.022402] [   T1097]        __driver_probe_device+0x18c/0x370
[  224.023652] [   T1097]        driver_probe_device+0x4a/0x120
[  224.024816] [   T1097]        __device_attach_driver+0x15e/0x270
[  224.026038] [   T1097]        bus_for_each_drv+0x114/0x1a0
[  224.027212] [   T1097]        __device_attach_async_helper+0x19c/0x240
[  224.028476] [   T1097]        async_run_entry_fn+0x96/0x4f0
[  224.029664] [   T1097]        process_one_work+0x85a/0x1460
[  224.030798] [   T1097]        worker_thread+0x5e2/0xfc0
[  224.032037] [   T1097]        kthread+0x39d/0x750
[  224.033136] [   T1097]        ret_from_fork+0x30/0x70
[  224.034232] [   T1097]        ret_from_fork_asm+0x1a/0x30
[  224.035360] [   T1097] 
                          -> #5 (&q->rq_qos_mutex){+.+.}-{4:4}:
[  224.037338] [   T1097]        __mutex_lock+0x1aa/0x1360
[  224.038397] [   T1097]        wbt_init+0x343/0x490
[  224.039497] [   T1097]        blk_register_queue+0x1c0/0x330
[  224.040612] [   T1097]        add_disk_fwnode+0x6b1/0x1010
[  224.041675] [   T1097]        sd_probe+0x94e/0xf30
[  224.042662] [   T1097]        really_probe+0x1e3/0x8a0
[  224.043672] [   T1097]        __driver_probe_device+0x18c/0x370
[  224.044728] [   T1097]        driver_probe_device+0x4a/0x120
[  224.045761] [   T1097]        __device_attach_driver+0x15e/0x270
[  224.046818] [   T1097]        bus_for_each_drv+0x114/0x1a0
[  224.047842] [   T1097]        __device_attach_async_helper+0x19c/0x240
[  224.048876] [   T1097]        async_run_entry_fn+0x96/0x4f0
[  224.049840] [   T1097]        process_one_work+0x85a/0x1460
[  224.050794] [   T1097]        worker_thread+0x5e2/0xfc0
[  224.051718] [   T1097]        kthread+0x39d/0x750
[  224.052653] [   T1097]        ret_from_fork+0x30/0x70
[  224.053548] [   T1097]        ret_from_fork_asm+0x1a/0x30
[  224.054492] [   T1097] 
                          -> #4 (&q->sysfs_lock){+.+.}-{4:4}:
[  224.056073] [   T1097]        __mutex_lock+0x1aa/0x1360
[  224.056934] [   T1097]        elv_iosched_store+0xf2/0x500
[  224.057809] [   T1097]        queue_attr_store+0x35a/0x480
[  224.058680] [   T1097]        kernfs_fop_write_iter+0x39e/0x5a0
[  224.059593] [   T1097]        vfs_write+0x5f9/0xe90
[  224.060449] [   T1097]        ksys_write+0xf6/0x1c0
[  224.061294] [   T1097]        do_syscall_64+0x93/0x180
[  224.062153] [   T1097]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  224.063087] [   T1097] 
                          -> #3 (&q->q_usage_counter(io)#2){++++}-{0:0}:
[  224.064675] [   T1097]        blk_mq_submit_bio+0x19b8/0x2070
[  224.065562] [   T1097]        __submit_bio+0x36b/0x6d0
[  224.066428] [   T1097]        submit_bio_noacct_nocheck+0x542/0xca0
[  224.067371] [   T1097]        iomap_readahead+0x4c4/0x7e0
[  224.068254] [   T1097]        read_pages+0x198/0xaf0
[  224.069085] [   T1097]        page_cache_ra_order+0x4d3/0xb50
[  224.069976] [   T1097]        filemap_get_pages+0x2c7/0x1850
[  224.070862] [   T1097]        filemap_read+0x31d/0xc30
[  224.071707] [   T1097]        xfs_file_buffered_read+0x1e9/0x2a0 [xfs]
[  224.072927] [   T1097]        xfs_file_read_iter+0x25f/0x4a0 [xfs]
[  224.074093] [   T1097]        vfs_read+0x6bc/0xa20
[  224.074918] [   T1097]        ksys_read+0xf6/0x1c0
[  224.075735] [   T1097]        do_syscall_64+0x93/0x180
[  224.076576] [   T1097]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  224.077534] [   T1097] 
                          -> #2 (mapping.invalidate_lock#2){++++}-{4:4}:
[  224.079118] [   T1097]        down_read+0x9b/0x470
[  224.079950] [   T1097]        filemap_fault+0x231/0x2ac0
[  224.080818] [   T1097]        __do_fault+0xf4/0x5d0
[  224.081658] [   T1097]        do_fault+0x965/0x11d0
[  224.082495] [   T1097]        __handle_mm_fault+0x1060/0x1f40
[  224.083408] [   T1097]        handle_mm_fault+0x21a/0x6b0
[  224.084297] [   T1097]        do_user_addr_fault+0x322/0xaa0
[  224.085205] [   T1097]        exc_page_fault+0x7a/0x110
[  224.086061] [   T1097]        asm_exc_page_fault+0x22/0x30
[  224.086941] [   T1097] 
                          -> #1 (&vma->vm_lock->lock){++++}-{4:4}:
[  224.088495] [   T1097]        down_write+0x8d/0x200
[  224.089353] [   T1097]        vma_link+0x1ff/0x590
[  224.090200] [   T1097]        insert_vm_struct+0x15a/0x340
[  224.091072] [   T1097]        alloc_bprm+0x626/0xbf0
[  224.091909] [   T1097]        kernel_execve+0x85/0x2f0
[  224.092754] [   T1097]        call_usermodehelper_exec_async+0x21b/0x430
[  224.093738] [   T1097]        ret_from_fork+0x30/0x70
[  224.094587] [   T1097]        ret_from_fork_asm+0x1a/0x30
[  224.095541] [   T1097] 
                          -> #0 (&mm->mmap_lock){++++}-{4:4}:
[  224.097077] [   T1097]        __lock_acquire+0x2f52/0x5ea0
[  224.097963] [   T1097]        lock_acquire+0x1b1/0x540
[  224.098818] [   T1097]        __might_fault+0xb9/0x120
[  224.099672] [   T1097]        _copy_to_user+0x1e/0x80
[  224.100519] [   T1097]        relay_file_read+0x149/0x8a0
[  224.101408] [   T1097]        full_proxy_read+0x110/0x1d0
[  224.102297] [   T1097]        vfs_read+0x1bb/0xa20
[  224.103115] [   T1097]        ksys_read+0xf6/0x1c0
[  224.103937] [   T1097]        do_syscall_64+0x93/0x180
[  224.104782] [   T1097]        entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  224.105726] [   T1097] 
                          other info that might help us debug this:

[  224.107884] [   T1097] Chain exists of:
                            &mm->mmap_lock --> &q->rq_qos_mutex --> &sb->s_type->i_mutex_key#3

[  224.110335] [   T1097]  Possible unsafe locking scenario:

[  224.111856] [   T1097]        CPU0                    CPU1
[  224.112723] [   T1097]        ----                    ----
[  224.113588] [   T1097]   lock(&sb->s_type->i_mutex_key#3);
[  224.114492] [   T1097]                                lock(&q->rq_qos_mutex);
[  224.115499] [   T1097]                                lock(&sb->s_type->i_mutex_key#3);
[  224.116609] [   T1097]   rlock(&mm->mmap_lock);
[  224.117440] [   T1097] 
                           *** DEADLOCK ***

[  224.119479] [   T1097] 2 locks held by blktrace/1097:
[  224.120325] [   T1097]  #0: ffff88812e5b80f8 (&f->f_pos_lock){+.+.}-{4:4}, at: fdget_pos+0x233/0x2f0
[  224.121474] [   T1097]  #1: ffff88810079dc68 (&sb->s_type->i_mutex_key#3){++++}-{4:4}, at: relay_file_read+0xa3/0x8a0
[  224.122741] [   T1097] 
                          stack backtrace:
[  224.124186] [   T1097] CPU: 0 UID: 0 PID: 1097 Comm: blktrace Not tainted 6.14.0-rc1+ #255
[  224.124189] [   T1097] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-3.fc41 04/01/2014
[  224.124193] [   T1097] Call Trace:
[  224.124195] [   T1097]  <TASK>
[  224.124197] [   T1097]  dump_stack_lvl+0x6a/0x90
[  224.124201] [   T1097]  print_circular_bug.cold+0x1e0/0x274
[  224.124205] [   T1097]  check_noncircular+0x306/0x3f0
[  224.124209] [   T1097]  ? __pfx_check_noncircular+0x10/0x10
[  224.124213] [   T1097]  ? lockdep_lock+0xca/0x1c0
[  224.124215] [   T1097]  ? __pfx_lockdep_lock+0x10/0x10
[  224.124219] [   T1097]  __lock_acquire+0x2f52/0x5ea0
[  224.124224] [   T1097]  ? find_held_lock+0x2d/0x110
[  224.124226] [   T1097]  ? __pfx___lock_acquire+0x10/0x10
[  224.124230] [   T1097]  ? lock_acquire+0x1b1/0x540
[  224.124233] [   T1097]  lock_acquire+0x1b1/0x540
[  224.124236] [   T1097]  ? __might_fault+0x99/0x120
[  224.124239] [   T1097]  ? __pfx_lock_acquire+0x10/0x10
[  224.124242] [   T1097]  ? lock_is_held_type+0xd5/0x130
[  224.124245] [   T1097]  ? __pfx___might_resched+0x10/0x10
[  224.124248] [   T1097]  ? _raw_spin_unlock+0x29/0x50
[  224.124250] [   T1097]  ? __might_fault+0x99/0x120
[  224.124252] [   T1097]  __might_fault+0xb9/0x120
[  224.124254] [   T1097]  ? __might_fault+0x99/0x120
[  224.124256] [   T1097]  ? __check_object_size+0x3f3/0x540
[  224.124258] [   T1097]  _copy_to_user+0x1e/0x80
[  224.124261] [   T1097]  relay_file_read+0x149/0x8a0
[  224.124266] [   T1097]  ? selinux_file_permission+0x36d/0x420
[  224.124270] [   T1097]  full_proxy_read+0x110/0x1d0
[  224.124272] [   T1097]  ? rw_verify_area+0x2f7/0x520
[  224.124274] [   T1097]  vfs_read+0x1bb/0xa20
[  224.124277] [   T1097]  ? __pfx___mutex_lock+0x10/0x10
[  224.124279] [   T1097]  ? __pfx_lock_release+0x10/0x10
[  224.124282] [   T1097]  ? __pfx_vfs_read+0x10/0x10
[  224.124286] [   T1097]  ? __fget_files+0x1ae/0x2e0
[  224.124290] [   T1097]  ksys_read+0xf6/0x1c0
[  224.124293] [   T1097]  ? __pfx_ksys_read+0x10/0x10
[  224.124297] [   T1097]  do_syscall_64+0x93/0x180
[  224.124299] [   T1097]  ? __pfx___rseq_handle_notify_resume+0x10/0x10
[  224.124301] [   T1097]  ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
[  224.124305] [   T1097]  ? __pfx___rseq_handle_notify_resume+0x10/0x10
[  224.124306] [   T1097]  ? __pfx_blkcg_maybe_throttle_current+0x10/0x10
[  224.124311] [   T1097]  ? lockdep_hardirqs_on_prepare+0x16d/0x400
[  224.124313] [   T1097]  ? do_syscall_64+0x9f/0x180
[  224.124315] [   T1097]  ? lockdep_hardirqs_on+0x78/0x100
[  224.124318] [   T1097]  ? do_syscall_64+0x9f/0x180
[  224.124319] [   T1097]  ? lockdep_hardirqs_on_prepare+0x16d/0x400
[  224.124322] [   T1097]  ? do_syscall_64+0x9f/0x180
[  224.124324] [   T1097]  ? lockdep_hardirqs_on+0x78/0x100
[  224.124326] [   T1097]  ? do_syscall_64+0x9f/0x180
[  224.124328] [   T1097]  ? lockdep_hardirqs_on+0x78/0x100
[  224.124330] [   T1097]  ? do_syscall_64+0x9f/0x180
[  224.124333] [   T1097]  ? handle_mm_fault+0x39d/0x6b0
[  224.124337] [   T1097]  ? lockdep_hardirqs_on_prepare+0x16d/0x400
[  224.124341] [   T1097]  entry_SYSCALL_64_after_hwframe+0x76/0x7e
[  224.124344] [   T1097] RIP: 0033:0x7fdbd297be4a
[  224.124348] [   T1097] Code: 55 48 89 e5 48 83 ec 20 48 89 55 e8 48 89 75 f0 89 7d f8 e8 28 58 f8 ff 48 8b 55 e8 48 8b 75 f0 41 89 c0 8b 7d f8 31 c0 0f 05 <48> 3d 00 f0 ff ff 77 2e 44 89 c7 48 89 45 f8 e8 82 58 f8 ff 48 8b
[  224.124350] [   T1097] RSP: 002b:00007fdbd2869d80 EFLAGS: 00000246 ORIG_RAX: 0000000000000000
[  224.124354] [   T1097] RAX: ffffffffffffffda RBX: 00007fdbcc000bb0 RCX: 00007fdbd297be4a
[  224.124356] [   T1097] RDX: 0000000000080000 RSI: 00007fdbc3000000 RDI: 0000000000000004
[  224.124357] [   T1097] RBP: 00007fdbd2869da0 R08: 0000000000000000 R09: 0000000000000000
[  224.124358] [   T1097] R10: 0000000000000001 R11: 0000000000000246 R12: 0000000000000000
[  224.124360] [   T1097] R13: 000055e34ca547e0 R14: 0000000000000000 R15: 00007fdbcc002c90
[  224.124364] [   T1097]  </TASK>
[  224.220675] [   T1041] sd 9:0:0:0: [sdd] Synchronizing SCSI cache




[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux