On Thu, Sep 14, 2023 at 8:21 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > Breno and Josef report a deadlock scenario from cgroup reclaim > re-entering the filesystem: > > [ 361.546690] ====================================================== > [ 361.559210] WARNING: possible circular locking dependency detected > [ 361.571703] 6.5.0-0_fbk700_debug_rc0_kbuilder_13159_gbf787a128001 #1 Tainted: G S E > [ 361.589704] ------------------------------------------------------ > [ 361.602277] find/9315 is trying to acquire lock: > [ 361.611625] ffff88837ba140c0 (&delayed_node->mutex){+.+.}-{4:4}, at: __btrfs_release_delayed_node+0x68/0x4f0 > [ 361.631437] > [ 361.631437] but task is already holding lock: > [ 361.643243] ffff8881765b8678 (btrfs-tree-01){++++}-{4:4}, at: btrfs_tree_read_lock+0x1e/0x40 > > [ 362.904457] mutex_lock_nested+0x1c/0x30 > [ 362.912414] __btrfs_release_delayed_node+0x68/0x4f0 > [ 362.922460] btrfs_evict_inode+0x301/0x770 > [ 362.982726] evict+0x17c/0x380 > [ 362.988944] prune_icache_sb+0x100/0x1d0 > [ 363.005559] super_cache_scan+0x1f8/0x260 > [ 363.013695] do_shrink_slab+0x2a2/0x540 > [ 363.021489] shrink_slab_memcg+0x237/0x3d0 > [ 363.050606] shrink_slab+0xa7/0x240 > [ 363.083382] shrink_node_memcgs+0x262/0x3b0 > [ 363.091870] shrink_node+0x1a4/0x720 > [ 363.099150] shrink_zones+0x1f6/0x5d0 > [ 363.148798] do_try_to_free_pages+0x19b/0x5e0 > [ 363.157633] try_to_free_mem_cgroup_pages+0x266/0x370 > [ 363.190575] reclaim_high+0x16f/0x1f0 > [ 363.208409] mem_cgroup_handle_over_high+0x10b/0x270 > [ 363.246678] try_charge_memcg+0xaf2/0xc70 > [ 363.304151] charge_memcg+0xf0/0x350 > [ 363.320070] __mem_cgroup_charge+0x28/0x40 > [ 363.328371] __filemap_add_folio+0x870/0xd50 > [ 363.371303] filemap_add_folio+0xdd/0x310 > [ 363.399696] __filemap_get_folio+0x2fc/0x7d0 > [ 363.419086] pagecache_get_page+0xe/0x30 > [ 363.427048] alloc_extent_buffer+0x1cd/0x6a0 > [ 363.435704] read_tree_block+0x43/0xc0 > [ 363.443316] read_block_for_search+0x361/0x510 > [ 363.466690] btrfs_search_slot+0xc8c/0x1520 > > This is caused by the mem_cgroup_handle_over_high() not respecting the > gfp_mask of the allocation context. We used to only call this function > on resume to userspace, where no locks were held. But c9afe31ec443 > ("memcg: synchronously enforce memory.high for large overcharges") > added a call from the allocation context without considering the gfp. > > Reported-by: Breno Leitao <leitao@xxxxxxxxxx> > Reported-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > Fixes: c9afe31ec443 ("memcg: synchronously enforce memory.high for large overcharges") > Cc: stable@xxxxxxxxxxxxxxx # 5.17+ > Signed-off-by: Johannes Weiner <hannes@xxxxxxxxxxx> Acked-by: Shakeel Butt <shakeelb@xxxxxxxxxx>