Re: 6.13/regression/bisected - after commit f1be1788a32e I see in the kernel log "possible circular locking dependency detected"

Ming Lei <ming.lei@xxxxxxxxxx> · Wed, 18 Dec 2024 10:36:12 +0800

On Wed, Dec 18, 2024 at 06:51:31AM +0500, Mikhail Gavrilov wrote:
> Hi,
> After commit f1be1788a32e I see in the kernel log "possible circular
> locking dependency detected" with follow stack trace:
> [  740.877178] ======================================================
> [  740.877180] WARNING: possible circular locking dependency detected
> [  740.877182] 6.13.0-rc3-f44d154d6e3d+ #392 Tainted: G        W    L
> [  740.877184] ------------------------------------------------------
> [  740.877186] btrfs-transacti/839 is trying to acquire lock:
> [  740.877188] ffff888182336a50
> (&q->q_usage_counter(io)#2){++++}-{0:0}, at: __submit_bio+0x335/0x520
> [  740.877197]
>                but task is already holding lock:
> [  740.877198] ffff8881826f7048 (btrfs-tree-00){++++}-{4:4}, at:
> btrfs_tree_read_lock_nested+0x27/0x170
> [  740.877205]
>                which lock already depends on the new lock.
> 
> [  740.877206]
>                the existing dependency chain (in reverse order) is:
> [  740.877207]
>                -> #4 (btrfs-tree-00){++++}-{4:4}:
> [  740.877211]        lock_release+0x397/0xd90
> [  740.877215]        up_read+0x1b/0x30
> [  740.877217]        btrfs_search_slot+0x16c9/0x31f0
> [  740.877220]        btrfs_lookup_inode+0xa9/0x360
> [  740.877222]        __btrfs_update_delayed_inode+0x131/0x760
> [  740.877225]        btrfs_async_run_delayed_root+0x4bc/0x630
> [  740.877226]        btrfs_work_helper+0x1b5/0xa50
> [  740.877228]        process_one_work+0x899/0x14b0
> [  740.877231]        worker_thread+0x5e6/0xfc0
> [  740.877233]        kthread+0x2d2/0x3a0
> [  740.877235]        ret_from_fork+0x31/0x70
> [  740.877238]        ret_from_fork_asm+0x1a/0x30
> [  740.877240]
>                -> #3 (&delayed_node->mutex){+.+.}-{4:4}:
> [  740.877244]        __mutex_lock+0x1ab/0x12c0
> [  740.877247]        __btrfs_release_delayed_node.part.0+0xa0/0xd40
> [  740.877249]        btrfs_evict_inode+0x44d/0xc20
> [  740.877252]        evict+0x3a4/0x840
> [  740.877255]        dispose_list+0xf0/0x1c0
> [  740.877257]        prune_icache_sb+0xe3/0x160
> [  740.877259]        super_cache_scan+0x30d/0x4f0
> [  740.877261]        do_shrink_slab+0x349/0xd60
> [  740.877264]        shrink_slab+0x7a4/0xd20
> [  740.877266]        shrink_one+0x403/0x830
> [  740.877268]        shrink_node+0x2337/0x3a60
> [  740.877270]        balance_pgdat+0xa4f/0x1500
> [  740.877272]        kswapd+0x4f3/0x940
> [  740.877274]        kthread+0x2d2/0x3a0
> [  740.877276]        ret_from_fork+0x31/0x70
> [  740.877278]        ret_from_fork_asm+0x1a/0x30
> [  740.877280]
>                -> #2 (fs_reclaim){+.+.}-{0:0}:
> [  740.877283]        fs_reclaim_acquire+0xc9/0x110
> [  740.877286]        __kmalloc_noprof+0xeb/0x690
> [  740.877288]        sd_revalidate_disk.isra.0+0x4356/0x8e00
> [  740.877291]        sd_probe+0x869/0xfa0
> [  740.877293]        really_probe+0x1e0/0x8a0
> [  740.877295]        __driver_probe_device+0x18c/0x370
> [  740.877297]        driver_probe_device+0x4a/0x120
> [  740.877299]        __device_attach_driver+0x162/0x270
> [  740.877300]        bus_for_each_drv+0x115/0x1a0
> [  740.877303]        __device_attach_async_helper+0x1a0/0x240
> [  740.877305]        async_run_entry_fn+0x97/0x4f0
> [  740.877307]        process_one_work+0x899/0x14b0
> [  740.877309]        worker_thread+0x5e6/0xfc0
> [  740.877310]        kthread+0x2d2/0x3a0
> [  740.877312]        ret_from_fork+0x31/0x70
> [  740.877314]        ret_from_fork_asm+0x1a/0x30
> [  740.877316]
>                -> #1 (&q->limits_lock){+.+.}-{4:4}:
> [  740.877320]        __mutex_lock+0x1ab/0x12c0
> [  740.877321]        nvme_update_ns_info_block+0x476/0x2630 [nvme_core]
> [  740.877332]        nvme_update_ns_info+0xbe/0xa60 [nvme_core]
> [  740.877339]        nvme_alloc_ns+0x1589/0x2c40 [nvme_core]
> [  740.877346]        nvme_scan_ns+0x579/0x660 [nvme_core]
> [  740.877353]        async_run_entry_fn+0x97/0x4f0
> [  740.877355]        process_one_work+0x899/0x14b0
> [  740.877357]        worker_thread+0x5e6/0xfc0
> [  740.877358]        kthread+0x2d2/0x3a0
> [  740.877360]        ret_from_fork+0x31/0x70
> [  740.877362]        ret_from_fork_asm+0x1a/0x30
> [  740.877364]
>                -> #0 (&q->q_usage_counter(io)#2){++++}-{0:0}:

This is another deadlock caused by dependency between q->limits_lock and
q->q_usage_counter, same with the one under discussion:

https://lore.kernel.org/linux-block/20241216080206.2850773-2-ming.lei@xxxxxxxxxx/

The dependency of queue_limits_start_update() over blk_mq_freeze_queue()
should be cut.

Thanks,
Ming