Re: [PATCH 06/11] xfs: deferred inode inactivation

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Wed, 24 Mar 2021 21:20:44 -0700

On Wed, Mar 24, 2021 at 03:57:06PM +1100, Dave Chinner wrote:
> On Tue, Mar 23, 2021 at 07:04:07PM -0700, Darrick J. Wong wrote:
> > On Tue, Mar 23, 2021 at 04:19:07PM +1100, Dave Chinner wrote:
> > > On Mon, Mar 22, 2021 at 09:00:37PM -0700, Darrick J. Wong wrote:
> > > > On Tue, Mar 23, 2021 at 12:44:17PM +1100, Dave Chinner wrote:
> > > > > On Wed, Mar 10, 2021 at 07:06:13PM -0800, Darrick J. Wong wrote:
> > > > > > +	/*
> > > > > > +	 * Not a match for our passed in scan filter?  Put it back on the shelf
> > > > > > +	 * and move on.
> > > > > > +	 */
> > > > > > +	spin_lock(&ip->i_flags_lock);
> > > > > > +	if (!xfs_inode_matches_eofb(ip, eofb)) {
> > > > > > +		ip->i_flags &= ~XFS_INACTIVATING;
> > > > > > +		spin_unlock(&ip->i_flags_lock);
> > > > > > +		return 0;
> > > > > > +	}
> > > > > > +	spin_unlock(&ip->i_flags_lock);
> > > > > 
> > > > > IDGI. What do EOF blocks have to do with running inode inactivation
> > > > > on this inode?
> > > > 
> > > > This enables foreground threads that hit EDQUOT to look for inodes to
> > > > inactivate in order to free up quota'd resources.
> > > 
> > > Not very obvious - better comment, please?
> > 
> > 	/*
> > 	 * Foreground threads that have hit ENOSPC or EDQUOT are allowed
> > 	 * to pass in a eofb structure to look for inodes to inactivate
> > 	 * immediately to free some resources.  If this inode isn't a
> > 	 * match, put it back on the shelf and move on.
> > 	 */
> > 
> > Better?
> 
> Yes.
> 
> > > > > > +	/*
> > > > > > +	 * Perform all on-disk metadata updates required to inactivate inodes.
> > > > > > +	 * Since this can involve finobt updates, do it now before we lose the
> > > > > > +	 * per-AG space reservations.
> > > > > > +	 */
> > > > > > +	xfs_inodegc_force(mp);
> > > > > 
> > > > > Should we stop background inactivation, because we can't make
> > > > > modifications anymore and hence background inactication makes little
> > > > > sense...

Ahhh, now I remember why the blockgc and inodegc workers call
sb_start_write before running any transactions.  We don't want the
threads to stall on transaction allocation when the fs is at FREEZE_FS,
which means that we have to cancel the work before we get there.  That
means it's too late to cancel the work items in xfs_fs_freeze.

We can't cancel the work items from a ->freeze_super handler before
calling freeze_super(), because we haven't taken any locks yet, and
we're still unfrozen.

For blockgc I solved this problem by making the worker get FREEZE_WRITE
protection so that we can't freeze the fs until the work is done.  Then
we don't have to care that much about ensuring that the worker threads
cannot run while the fs is frozen.  But that's a bit sloppy, since
they're still consuming CPU time.

I could solve this problem by observing that freeze_super calls
sync_filesystem when the fs is in FREEZE_PAGEFAULTS and is about to move
to FREEZE_FS, but that seems ugly and hacky.

> > > > 
> > > > We don't actually stop background gc transactions or other internal
> > > > updates on readonly filesystems
> > > 
> > > Yes we do - that's what xfs_blockgc_stop() higher up in this
> > > function does. xfs_log_clean() further down in the function also
> > > stops the background log work (that covers the log when idle)
> > > because xfs_remount_ro() leaves the log clean.
> > > 
> > > THese all get restarted in xfs_remount_rw()....
> > > 
> > > > -- the ro part means only that we don't
> > > > let /userspace/ change anything directly.  If you open a file readonly,
> > > > unlink it, freeze the fs, and close the file, we'll still free it.
> > > 
> > > How do you unlink the file on a RO mount?
> > 
> > I got confused here.  If you open a file readonly on a rw mount, unlink
> > it, remount the fs readonly, and close the file, we'll still free it.
> 
> Not even that way. :)
> 
> You can't remount-ro while there are open-but-unlinked files. See
> sb->s_remove_count. It's incremented when drop_link() drops the link
> count to zero in the unlink() syscall, then decremented when
> __destroy_inode() is called during inode eviction when the final
> reference goes away. Hence while we have open but unlinked inodes in
> active use, that superblock counter is non-zero.
> 
> In sb_prepare_remount_readonly() we have:
> 
> 	if (atomic_long_read(&sb->s_remove_count))
> 		return -EBUSY;
> 
> So a remount-ro will fail with -EBUSY while there are open but
> unlinked files.

Ah, ok.

> Except, of course, if you are doing an emergency remount-ro from
> sysrq, in which case these open-but-unlinked checks are not done,
> but when we are forcing the fs to be read-only this way, it's not
> being done for correctness (i.e the system is about to be shot down)
> so we don't really care...

Well yes, most bets are off during emergency ro-remounts. :)

--D

> 
> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx