On Sun, Jan 06, 2019 at 12:28:39AM -0500, Qian Cai wrote: > It looks like due to 8683edb7755 (xfs: avoid lockdep false positives in > xfs_trans_alloc), it triggers lockdep in some other ways. > > [81388.050050] WARNING: possible circular locking dependency detected > [81388.056272] 4.20.0+ #47 Tainted: G W L > [81388.061182] ------------------------------------------------------ > [81388.067402] fsfreeze/64059 is trying to acquire lock: > [81388.072487] 000000004f938084 (fs_reclaim){+.+.}, at: > fs_reclaim_acquire.part.19+0x5/0x30 > [81388.080649] > [81388.080649] but task is already holding lock: > [81388.086517] 00000000339e9c6f (sb_internal){++++}, at: > percpu_down_write+0xbb/0x410 > [81388.094140] > [81388.094140] which lock already depends on the new lock. > [81388.094140] > [81388.102367] > [81388.102367] the existing dependency chain (in reverse order) is: > [81388.109897] > [81388.109897] -> #1 (sb_internal){++++}: > [81388.115163] __lock_acquire+0x460/0x850 > [81388.119549] lock_acquire+0x1e0/0x3f0 > [81388.123764] __sb_start_write+0x150/0x1e0 > [81388.128437] xfs_trans_alloc+0x49b/0x5e0 [xfs] > [81388.133540] xfs_setfilesize_trans_alloc+0xa6/0x1a0 [xfs] > [81388.139602] xfs_submit_ioend+0x239/0x3e0 [xfs] > [81388.144790] xfs_vm_writepage+0xbc/0x100 [xfs] > [81388.149793] pageout.isra.2+0x919/0x13c0 > [81388.154264] shrink_page_list+0x3807/0x58a0 > [81388.158997] shrink_inactive_list+0x4b3/0xfc0 > [81388.163909] shrink_node_memcg+0x5e5/0x1660 > [81388.168642] shrink_node+0x2a3/0xaa0 > [81388.172766] balance_pgdat+0x7cc/0xea0 > [81388.177067] kswapd+0x65e/0xc40 > [81388.180757] kthread+0x1d2/0x1f0 > [81388.184535] ret_from_fork+0x27/0x50 Writeback of data from kswapd, allocating a transaction. This is such a horrible thing to be doing from many, many perspectives. /me recently proposed a patch to remove ->writepage from XFS to avoid this sort of crap altogether. > [81388.188655] > [81388.188655] -> #0 (fs_reclaim){+.+.}: > [81388.193832] validate_chain.isra.14+0xd43/0x1910 > [81388.199004] __lock_acquire+0x460/0x850 > [81388.203391] lock_acquire+0x1e0/0x3f0 > [81388.207602] fs_reclaim_acquire.part.19+0x29/0x30 > [81388.212862] fs_reclaim_acquire+0x19/0x20 > [81388.217424] kmem_cache_alloc+0x2f/0x330 > [81388.222004] kmem_zone_alloc+0x6e/0x110 [xfs] > [81388.227023] xfs_trans_alloc+0xfd/0x5e0 [xfs] > [81388.232034] xfs_sync_sb+0x76/0x100 [xfs] > [81388.236701] xfs_log_sbcount+0x8e/0xa0 [xfs] > [81388.241631] xfs_quiesce_attr+0x112/0x1d0 [xfs] > [81388.246821] xfs_fs_freeze+0x38/0x50 [xfs] > [81388.251469] freeze_super+0x122/0x190 > [81388.255682] do_vfs_ioctl+0xa04/0xbe0 Freezing the filesystem, after all the data has been cleaned. IOWs memory reclaim will never run the above writeback path when the freeze process is trying to allocate a transaction here because there are no dirty data pages in the filesystem at this point. Indeed, this xfs_sync_sb() path sets XFS_TRANS_NO_WRITECOUNT so that it /doesn't deadlock/ by taking freeze references for the transaction. We've just drained all the transactions in progress and written back all the dirty metadata, too, and so the filesystem is completely clean and only needs the superblock to be updated to complete the freeze process. And to do that, it does not take a freeze reference because calling sb_start_intwrite() here would deadlock. IOWs, this is a false positive, caused by the fact that xfs_trans_alloc() is called from both above and below memory reclaim as well as within /every level/ of freeze processing. Lockdep is unable to describe the staged flush logic in the freeze process that prevents deadlocks from occurring, and hence we will pretty much always see false positives in the freeze path.... Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx