On Tue, Jan 16, 2024 at 09:59:45AM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > In the past we've had problems with lockdep false positives stemming > from inode locking occurring in memory reclaim contexts (e.g. from > superblock shrinkers). Lockdep doesn't know that inodes access from > above memory reclaim cannot be accessed from below memory reclaim > (and vice versa) but there has never been a good solution to solving > this problem with lockdep annotations. > > This situation isn't unique to inode locks - buffers are also locked > above and below memory reclaim, and we have to maintain lock > ordering for them - and against inodes - appropriately. IOWs, the > same code paths and locks are taken both above and below memory > reclaim and so we always need to make sure the lock orders are > consistent. We are spared the lockdep problems this might cause > by the fact that semaphores and bit locks aren't covered by lockdep. > > In general, this sort of lockdep false positive detection is cause > by code that runs GFP_KERNEL memory allocation with an actively > referenced inode locked. When it is run from a transaction, memory > allocation is automatically GFP_NOFS, so we don't have reclaim > recursion issues. So in the places where we do memory allocation > with inodes locked outside of a transaction, we have explicitly set > them to use GFP_NOFS allocations to prevent lockdep false positives > from being reported if the allocation dips into direct memory > reclaim. > > More recently, __GFP_NOLOCKDEP was added to the memory allocation > flags to tell lockdep not to track that particular allocation for > the purposes of reclaim recursion detection. This is a much better > way of preventing false positives - it allows us to use GFP_KERNEL > context outside of transactions, and allows direct memory reclaim to > proceed normally without throwing out false positive deadlock > warnings. Hi Dave, I recently encountered the following AA deadlock lockdep warning in Linux-6.9.0. This version of the kernel has currently merged your patch set. I believe this is a lockdep false positive warning. The xfs_dir_lookup_args() function is in a non-transactional context and allocates memory with the __GFP_NOLOCKDEP flag in xfs_buf_alloc_pages(). Even though __GFP_NOLOCKDEP can tell lockdep not to track that particular allocation for the purposes of reclaim recursion detection, it cannot completely replace __GFP_NOFS. Getting trapped in direct memory reclaim maybe trigger the AA deadlock warning as shown below. Or am I mistaken somewhere? I look forward to your reply. Thanks, Long Li [12051.255974][ T6480] ============================================ [12051.256590][ T6480] WARNING: possible recursive locking detected [12051.257207][ T6480] 6.9.0-xfstests-12131-gb902367d6fde-dirty #747 Not tainted [12051.257919][ T6480] -------------------------------------------- [12051.258513][ T6480] cc1/6480 is trying to acquire lock: [12051.259017][ T6480] ffff88804f40a018 (&xfs_dir_ilock_class){++++}-{3:3}, at: xfs_icwalk_ag+0x7c0/0x1690 [12051.259926][ T6480] [12051.259926][ T6480] but task is already holding lock: [12051.260599][ T6480] ffff8881004b5658 (&xfs_dir_ilock_class){++++}-{3:3}, at: xfs_ilock_data_map_shared+0x52/0x70 [12051.261546][ T6480] [12051.261546][ T6480] other info that might help us debug this: [12051.262288][ T6480] Possible unsafe locking scenario: [12051.262288][ T6480] [12051.262972][ T6480] CPU0 [12051.263283][ T6480] ---- [12051.263587][ T6480] lock(&xfs_dir_ilock_class); [12051.264048][ T6480] lock(&xfs_dir_ilock_class); [12051.264502][ T6480] [12051.264502][ T6480] *** DEADLOCK *** [12051.264502][ T6480] [12051.265267][ T6480] May be due to missing lock nesting notation [12051.265267][ T6480] [12051.266052][ T6480] 3 locks held by cc1/6480: [12051.266477][ T6480] #0: ffff8881004b5878 (&inode->i_sb->s_type->i_mutex_dir_key){++++}-{3:3}, at: path_openat+0xaa4/0x1090 [12051.267526][ T6480] #1: ffff8881004b5658 (&xfs_dir_ilock_class){++++}-{3:3}, at: xfs_ilock_data_map_shared+0x52/0x70 [12051.268528][ T6480] #2: ffff888107fda0e0 (&type->s_umount_key#42){.+.+}-{3:3}, at: super_trylock_shared+0x1c/0xb0 [12051.269511][ T6480] [12051.269511][ T6480] stack backtrace: [12051.270092][ T6480] CPU: 2 PID: 6480 Comm: cc1 Not tainted 6.9.0-xfstests-12131-gb902367d6fde-dirty #747 [12051.271012][ T6480] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_073836-buildvm-ppc64le-16.ppc.fedoraproject.org-3.fc31 04/01/2014 [12051.272321][ T6480] Call Trace: [12051.272640][ T6480] <TASK> [12051.272913][ T6480] dump_stack_lvl+0x82/0xd0 [12051.273347][ T6480] validate_chain+0xe70/0x1d30 [12051.274765][ T6480] __lock_acquire+0xd9a/0x1e90 [12051.275208][ T6480] lock_acquire+0x1a9/0x4f0 [12051.277032][ T6480] down_write_nested+0x9b/0x200 [12051.279413][ T6480] xfs_icwalk_ag+0x7c0/0x1690 [12051.284326][ T6480] xfs_icwalk+0x4f/0xe0 [12051.284735][ T6480] xfs_reclaim_inodes_nr+0x148/0x1f0 [12051.285792][ T6480] super_cache_scan+0x30c/0x440 [12051.286247][ T6480] do_shrink_slab+0x340/0xce0 [12051.286701][ T6480] shrink_slab_memcg+0x231/0x8f0 [12051.289127][ T6480] shrink_slab+0x4ad/0x4f0 [12051.290620][ T6480] shrink_node+0x86b/0x1de0 [12051.291055][ T6480] do_try_to_free_pages+0x2c4/0x1490 [12051.293643][ T6480] try_to_free_pages+0x20d/0x540 [12051.294641][ T6480] __alloc_pages_slowpath.constprop.0+0x754/0x2050 [12051.299337][ T6480] __alloc_pages_noprof+0x54f/0x660 [12051.301344][ T6480] alloc_pages_bulk_noprof+0x6fb/0xe00 [12051.302404][ T6480] xfs_buf_alloc_pages+0x1b9/0x850 [12051.302889][ T6480] xfs_buf_get_map+0xe86/0x1590 [12051.303847][ T6480] xfs_buf_read_map+0xb6/0x7f0 [12051.306234][ T6480] xfs_trans_read_buf_map+0x474/0xd30 [12051.307753][ T6480] xfs_da_read_buf+0x1c8/0x2c0 [12051.310298][ T6480] xfs_dir3_data_read+0x36/0x2e0 [12051.310783][ T6480] xfs_dir2_leafn_lookup_for_entry+0x3d6/0x14b0 [12051.313039][ T6480] xfs_da3_node_lookup_int+0xef1/0x1810 [12051.315658][ T6480] xfs_dir2_node_lookup+0xc5/0x580 [12051.317156][ T6480] xfs_dir_lookup_args+0xbf/0xe0 [12075.149236][ T5555] new_slab+0x2c4/0x320 [12075.149602][ T5555] ___slab_alloc+0xcdd/0x1640 [12075.152775][ T5555] __slab_alloc.isra.0+0x1f/0x40 [12075.153238][ T5555] kmem_cache_alloc_noprof+0x34f/0x3a0 [12075.154130][ T5555] vm_area_dup+0x51/0x160 [12075.154772][ T5555] __split_vma+0x135/0x1930 [12075.158003][ T5555] vma_modify+0x228/0x300 [12075.158380][ T5555] mprotect_fixup+0x1a0/0x950 [12075.159252][ T5555] do_mprotect_pkey+0x79c/0xa40 [12075.161063][ T5555] __x64_sys_mprotect+0x78/0xc0 [12075.161492][ T5555] do_syscall_64+0x66/0x140 [12075.161891][ T5555] entry_SYSCALL_64_after_hwframe+0x76/0x7e [12075.162409][ T5555] RIP: 0033:0x7ff736f2bc5b [12075.162811][ T5555] Code: 73 01 c3 48 8d 0d a5 15 01 00 f7 d8 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 f3 0f 1e fa b8 0a 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 83 [12075.164490][ T5555] RSP: 002b:00007ffc9c420998 EFLAGS: 00000206 ORIG_RAX: 000000000000000a [12075.165229][ T5555] RAX: ffffffffffffffda RBX: 00007ff736f3ca30 RCX: 00007ff736f2bc5b [12075.165937][ T5555] RDX: 0000000000000001 RSI: 0000000000002000 RDI: 00007ff736f3a000 [12075.166644][ T5555] RBP: 00007ffc9c420ab0 R08: 0000000000000000 R09: 0000000000000000 [12075.167349][ T5555] R10: 00007ff736f09000 R11: 0000000000000206 R12: 0000000000000000 [12075.168061][ T5555] R13: 00007ff736f3b9e0 R14: 00007ff736f3ca30 R15: 00007ff736f09000 [12075.168773][ T5555] </TASK> [12075.169090][ T5555] Mem-Info: [12075.169378][ T5555] active_anon:6735 inactive_anon:1469067 isolated_anon:0 [12075.169378][ T5555] active_file:24 inactive_file:508 isolated_file:424 [12075.169378][ T5555] unevictable:0 dirty:1 writeback:0 [12075.169378][ T5555] slab_reclaimable:56327 slab_unreclaimable:112381 [12075.169378][ T5555] mapped:718 shmem:275 pagetables:53700 [12075.169378][ T5555] sec_pagetables:0 bounce:0 [12075.169378][ T5555] kernel_misc_reclaimable:0 [12075.169378][ T5555] free:11595 free_pcp:554 free_cma:0 [12075.173320][ T5555] Node 0 active_anon:26940kB inactive_anon:5876268kB active_file:96kB inactive_file:2032kB unevictable:0kB isolated(anon):0kB isolated(file):1696kB mapped:2872kB dirtyo [12075.175767][ T5555] Node 0 DMA free:20kB boost:0kB min:20kB low:32kB high:44kB reserved_highatomic:0KB active_anon:68kB inactive_anon:15272kB active_file:0kB inactive_file:0kB unevictabB [12075.178094][ T5555] lowmem_reserve[]: 0 2895 6821 0 0 [12075.178559][ T5555] Node 0 DMA32 free:19760kB boost:15612kB min:20092kB low:23056kB high:26020kB reserved_highatomic:0KB active_anon:12592kB inactive_anon:2416704kB active_file:0kB inacB [12075.181075][ T5555] lowmem_reserve[]: 0 0 3925 0 0 [12075.181517][ T5555] Node 0 Normal free:26600kB boost:21156kB min:27228kB low:31244kB high:35260kB reserved_highatomic:0KB active_anon:14280kB inactive_anon:3444292kB active_file:120kB iB [12075.184140][ T5555] lowmem_reserve[]: 0 0 0 0 0 [12075.184554][ T5555] Node 0 DMA: 1*4kB (U) 2*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20kB [12075.185606][ T5555] Node 0 DMA32: 85*4kB (UME) 45*8kB (UME) 12*16kB (UME) 6*32kB (UE) 300*64kB (UE) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 20284kB [12075.186882][ T5555] Node 0 Normal: 0*4kB 1*8kB (U) 0*16kB 299*32kB (U) 247*64kB (UE) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 25384kB [12075.188047][ T5555] 1152 total pagecache pages [12075.188450][ T5555] 0 pages in swap cache [12075.188821][ T5555] Free swap = 0kB [12075.189201][ T5555] Total swap = 0kB [12075.189768][ T5555] 2097018 pages RAM [12075.190103][ T5555] 0 pages HighMem/MovableOnly [12075.190509][ T5555] 345037 pages reserved > > The obvious places that lock inodes and do memory allocation are the > lookup paths and inode extent list initialisation. These occur in > non-transactional GFP_KERNEL contexts, and so can run direct reclaim > and lock inodes. > > This patch makes a first path through all the explicit GFP_NOFS > allocations in XFS and converts the obvious ones to GFP_KERNEL | > __GFP_NOLOCKDEP as a first step towards removing explicit GFP_NOFS > allocations from the XFS code. >