Re: [PATCH] xfs: Fix false positive lockdep warning with sb_internal & fs_reclaim

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Thu, 2 Jan 2020 10:24:35 -0800

On Thu, Jan 02, 2020 at 11:19:51AM -0500, Qian Cai wrote:
> 
> 
> > On Jan 2, 2020, at 10:52 AM, Waiman Long <longman@xxxxxxxxxx> wrote:
> > 
> > Depending on the workloads, the following circular locking dependency
> > warning between sb_internal (a percpu rwsem) and fs_reclaim (a pseudo
> > lock) may show up:
> > 
> > ======================================================
> > WARNING: possible circular locking dependency detected
> > 5.0.0-rc1+ #60 Tainted: G        W
> > ------------------------------------------------------
> > fsfreeze/4346 is trying to acquire lock:
> > 0000000026f1d784 (fs_reclaim){+.+.}, at:
> > fs_reclaim_acquire.part.19+0x5/0x30
> > 
> > but task is already holding lock:
> > 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
> > 
> > which lock already depends on the new lock.
> >  :
> > Possible unsafe locking scenario:
> > 
> >       CPU0                    CPU1
> >       ----                    ----
> >  lock(sb_internal);
> >                               lock(fs_reclaim);
> >                               lock(sb_internal);
> >  lock(fs_reclaim);
> > 
> > *** DEADLOCK ***
> > 
> > 4 locks held by fsfreeze/4346:
> > #0: 00000000b478ef56 (sb_writers#8){++++}, at: percpu_down_write+0xb4/0x650
> > #1: 000000001ec487a9 (&type->s_umount_key#28){++++}, at: freeze_super+0xda/0x290
> > #2: 000000003edbd5a0 (sb_pagefaults){++++}, at: percpu_down_write+0xb4/0x650
> > #3: 0000000072bfc54b (sb_internal){++++}, at: percpu_down_write+0xb4/0x650
> > 
> > stack backtrace:
> > Call Trace:
> > dump_stack+0xe0/0x19a
> > print_circular_bug.isra.10.cold.34+0x2f4/0x435
> > check_prev_add.constprop.19+0xca1/0x15f0
> > validate_chain.isra.14+0x11af/0x3b50
> > __lock_acquire+0x728/0x1200
> > lock_acquire+0x269/0x5a0
> > fs_reclaim_acquire.part.19+0x29/0x30
> > fs_reclaim_acquire+0x19/0x20
> > kmem_cache_alloc+0x3e/0x3f0
> > kmem_zone_alloc+0x79/0x150
> > xfs_trans_alloc+0xfa/0x9d0
> > xfs_sync_sb+0x86/0x170
> > xfs_log_sbcount+0x10f/0x140
> > xfs_quiesce_attr+0x134/0x270
> > xfs_fs_freeze+0x4a/0x70
> > freeze_super+0x1af/0x290
> > do_vfs_ioctl+0xedc/0x16c0
> > ksys_ioctl+0x41/0x80
> > __x64_sys_ioctl+0x73/0xa9
> > do_syscall_64+0x18f/0xd23
> > entry_SYSCALL_64_after_hwframe+0x49/0xbe
> > 
> > According to Dave Chinner:
> > 
> >  Freezing the filesystem, after all the data has been cleaned. IOWs
> >  memory reclaim will never run the above writeback path when
> >  the freeze process is trying to allocate a transaction here because
> >  there are no dirty data pages in the filesystem at this point.
> > 
> >  Indeed, this xfs_sync_sb() path sets XFS_TRANS_NO_WRITECOUNT so that
> >  it /doesn't deadlock/ by taking freeze references for the
> >  transaction. We've just drained all the transactions
> >  in progress and written back all the dirty metadata, too, and so the
> >  filesystem is completely clean and only needs the superblock to be
> >  updated to complete the freeze process. And to do that, it does not
> >  take a freeze reference because calling sb_start_intwrite() here
> >  would deadlock.
> > 
> >  IOWs, this is a false positive, caused by the fact that
> >  xfs_trans_alloc() is called from both above and below memory reclaim
> >  as well as within /every level/ of freeze processing. Lockdep is
> >  unable to describe the staged flush logic in the freeze process that
> >  prevents deadlocks from occurring, and hence we will pretty much
> >  always see false positives in the freeze path....
> > 
> > Perhaps breaking the fs_reclaim pseudo lock into a per filesystem lock
> > may fix the issue. However, that will greatly complicate the logic and
> > may not be worth it.
> > 
> > Another way to fix it is to disable the taking of the fs_reclaim
> > pseudo lock when in the freezing code path as a reclaim on the freezed
> > filesystem is not possible as stated above. This patch takes this
> > approach by setting the __GFP_NOLOCKDEP flag in the slab memory
> > allocation calls when the filesystem has been freezed.
> > 
> > Without this patch, the command sequence below will show that the lock
> > dependency chain sb_internal -> fs_reclaim exists.
> > 
> > # fsfreeze -f /home
> > # fsfreeze --unfreeze /home
> > # grep -i fs_reclaim -C 3 /proc/lockdep_chains | grep -C 5 sb_internal
> > 
> > After applying the patch, such sb_internal -> fs_reclaim lock dependency
> > chain can no longer be found. Because of that, the locking dependency
> > warning will not be shown.
> 
> There was an attempt to fix this in the past, but Dave rejected right
> away for any workaround in xfs and insisted to make lockdep smarter
> instead. No sure your approach will make any difference this time.
> Good luck.

/me wonders if you can fix this by having the freeze path call
memalloc_nofs_save() since we probably don't want to be recursing into
the fs for reclaim while freezing it?  Probably not, because that's a
bigger hammer than we really need here.  We can certainly steal memory
from other filesystems that aren't frozen.

It doesn't solve the key issue that lockdep isn't smart enough to know
that we can't recurse into the fs that's being frozen and therefore
there's no chance of deadlock.

--D