On Wed, Mar 24, 2021 at 09:31:58AM +1100, Dave Chinner wrote: > On Wed, Mar 10, 2021 at 07:06:41PM -0800, Darrick J. Wong wrote: > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > Create a polled version of xfs_inactive_force so that we can force > > inactivation while holding a lock (usually the umount lock) without > > tripping over the softlockup timer. This is for callers that hold vfs > > locks while calling inactivation, which is currently unmount, iunlink > > processing during mount, and rw->ro remount. > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > --- > > fs/xfs/xfs_icache.c | 38 +++++++++++++++++++++++++++++++++++++- > > fs/xfs/xfs_icache.h | 1 + > > fs/xfs/xfs_mount.c | 2 +- > > fs/xfs/xfs_mount.h | 5 +++++ > > fs/xfs/xfs_super.c | 3 ++- > > 5 files changed, 46 insertions(+), 3 deletions(-) > > > > > > diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c > > index d5f580b92e48..9db2beb4e732 100644 > > --- a/fs/xfs/xfs_icache.c > > +++ b/fs/xfs/xfs_icache.c > > @@ -25,6 +25,7 @@ > > #include "xfs_ialloc.h" > > > > #include <linux/iversion.h> > > +#include <linux/nmi.h> > > This stuff goes in fs/xfs/xfs_linux.h, not here. > > > > > /* > > * Allocate and initialise an xfs_inode. > > @@ -2067,8 +2068,12 @@ xfs_inodegc_free_space( > > struct xfs_mount *mp, > > struct xfs_eofblocks *eofb) > > { > > - return xfs_inode_walk(mp, XFS_INODE_WALK_INACTIVE, > > + int error; > > + > > + error = xfs_inode_walk(mp, XFS_INODE_WALK_INACTIVE, > > xfs_inactive_inode, eofb, XFS_ICI_INACTIVE_TAG); > > + wake_up(&mp->m_inactive_wait); > > + return error; > > } > > > > /* Try to get inode inactivation moving. */ > > @@ -2138,6 +2143,37 @@ xfs_inodegc_force( > > flush_workqueue(mp->m_gc_workqueue); > > } > > > > +/* > > + * Force all inode inactivation work to run immediately, and poll until the > > + * work is complete. Callers should only use this function if they must > > + * inactivate inodes while holding VFS locks, and must be prepared to prevent > > + * or to wait for inodes that are queued for inactivation while this runs. > > + */ > > +void > > +xfs_inodegc_force_poll( > > + struct xfs_mount *mp) > > +{ > > + struct xfs_perag *pag; > > + xfs_agnumber_t agno; > > + bool queued = false; > > + > > + for_each_perag_tag(mp, agno, pag, XFS_ICI_INACTIVE_TAG) > > + queued |= xfs_inodegc_force_pag(pag); > > + if (!queued) > > + return; > > + > > + /* > > + * Touch the softlockup watchdog every 1/10th of a second while there > > + * are still inactivation-tagged inodes in the filesystem. > > + */ > > + while (!wait_event_timeout(mp->m_inactive_wait, > > + !radix_tree_tagged(&mp->m_perag_tree, > > + XFS_ICI_INACTIVE_TAG), > > + HZ / 10)) { > > + touch_softlockup_watchdog(); > > + } > > +} > > This looks like a deadlock waiting to be tripped over. As long as > there is something still able to queue inodes for inactivation, > that radix tree tag check will always trigger and put us back to > sleep. Yes, I know this is a total livelock vector. This ugly function exists to avoid stall warnings when the VFS has called us with s_umount held and there are a lot of inodes to process. As the function comment points out, callers must prevent anyone else from inactivating inodes or be prepared to deal with the consequences, which the current callers are prepared to do. I can't think of a better way to handle this, since we need to bail out of the workqueue flush periodically to make the softlockup thing happy. Alternately we could just let the stall warnings happen and deal with the people who file a bug for every stack trace they see the kernel emit. > Also, in terms of workqueues, this is a "sync flush" i because we > are waiting for it. e.g. the difference between cancel_work() and > cancel_work_sync() is that the later waits for all the work in > progress to complete before returning and the former doesn't wait... Yeah, I'll change all the xfs_inodegc_force-* -> xfs_inodegc_flush_*. --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx