Re: [PATCH 4/3] xfs: attach dquots to inode before reading data/cow fork mappings

Dave Chinner <david@xxxxxxxxxxxxx> · Tue, 29 Nov 2022 19:04:50 +1100

On Mon, Nov 28, 2022 at 10:50:40PM -0800, Darrick J. Wong wrote:
> On Tue, Nov 29, 2022 at 05:31:04PM +1100, Dave Chinner wrote:
> > On Sun, Nov 27, 2022 at 10:36:29AM -0800, Darrick J. Wong wrote:
> > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > 
> > > I've been running near-continuous integration testing of online fsck,
> > > and I've noticed that once a day, one of the ARM VMs will fail the test
> > > with out of order records in the data fork.
> > > 
> > > xfs/804 races fsstress with online scrub (aka scan but do not change
> > > anything), so I think this might be a bug in the core xfs code.  This
> > > also only seems to trigger if one runs the test for more than ~6 minutes
> > > via TIME_FACTOR=13 or something.
> > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/tree/tests/xfs/804?h=djwong-wtf
> > .....
> > > So.  Fix this by moving the dqattach_locked call up, and add a comment
> > > about how we must attach the dquots *before* sampling the data/cow fork
> > > contents.
> > > 
> > > Fixes: a526c85c2236 ("xfs: move xfs_file_iomap_begin_delay around") # goes further back than this
> > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > ---
> > >  fs/xfs/xfs_iomap.c |   12 ++++++++----
> > >  1 file changed, 8 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
> > > index 1bdd7afc1010..d903f0586490 100644
> > > --- a/fs/xfs/xfs_iomap.c
> > > +++ b/fs/xfs/xfs_iomap.c
> > > @@ -984,6 +984,14 @@ xfs_buffered_write_iomap_begin(
> > >  	if (error)
> > >  		goto out_unlock;
> > >  
> > > +	/*
> > > +	 * Attach dquots before we access the data/cow fork mappings, because
> > > +	 * this function can cycle the ILOCK.
> > > +	 */
> > > +	error = xfs_qm_dqattach_locked(ip, false);
> > > +	if (error)
> > > +		goto out_unlock;
> > > +
> > >  	/*
> > >  	 * Search the data fork first to look up our source mapping.  We
> > >  	 * always need the data fork map, as we have to return it to the
> > > @@ -1071,10 +1079,6 @@ xfs_buffered_write_iomap_begin(
> > >  			allocfork = XFS_COW_FORK;
> > >  	}
> > >  
> > > -	error = xfs_qm_dqattach_locked(ip, false);
> > > -	if (error)
> > > -		goto out_unlock;
> > > -
> > >  	if (eof && offset + count > XFS_ISIZE(ip)) {
> > >  		/*
> > >  		 * Determine the initial size of the preallocation.
> > > 
> > 
> > Why not attached the dquots before we call xfs_ilock_for_iomap()?
> 
> I wanted to minimize the number of xfs_ilock calls -- under the scheme
> you outline, xfs_qm_dqattach will lock it once; a dquot cache miss
> will drop and retake it; and then xfs_ilock_for_iomap would take it yet
> again.  That's one more ilock song-and-dance than this patch does...

Ture, but we don't have an extra lock cycle if the dquots are
already attached to the inode - xfs_qm_dqattach() checks for
attached inodes before it takes the ILOCK to attach them. Hence if
we are doing lots of small writes to a file, we only take this extra
lock cycle for the first delalloc reservation that we make, not
every single one....

We have to do it this way for anything that runs an actual
transaction (like the direct IO write path we take if an extent size
hint is set) as we can't cycle the ILOCK within a transaction
context, so the code is already optimised for the "dquots already
attached" case....

> > That way we can just call xfs_qm_dqattach(ip, false) and just return
> > on failure immediately. That's exactly what we do in the
> > xfs_iomap_write_direct() path, and it avoids the need to mention
> > anything about lock cycling because we just don't care
> > about cycling the ILOCK to read in or allocate dquots before we
> > start the real work that needs to be done...
> 
> ...but I guess it's cleaner once you start assuming that dqattach has
> grown its own NOWAIT flag.  I'd sorta prefer to commit this corruption
> fix as it is and rearrange dqget with NOWAIT as a separate series since
> Linus has already warned us[1] to get things done sooner than later.
> 
> [1] https://lore.kernel.org/lkml/CAHk-=wgUZwX8Sbb8Zvm7FxWVfX6CGuE7x+E16VKoqL7Ok9vv7g@xxxxxxxxxxxxxx/

<shrug>

If that's your concern, then

Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx>

However, as maintainer I was never concerned about being "too late
in the cycle". I'd just push it into the for next tree with a stable
tag and when it gets merged in a couple of weeks the stable
maintainers should notice it and backport it appropriately
automatically....

For distro backports, merging into the XFS tree is good enough to be
iconsidered upstream as it's pretty much guaranteed to end up in the
mainline tree once it's been merged by the maintainer....

> (OTOH it's already 6pm your time so I may very well be done with all
> the quota nowait changes before you wake up :P)

NOWAIT changes are definitely next cycle stuff :)

> > Hmmmmm - this means there's a potential problem with IOCB_NOWAIT
> > here - if the dquots are not in memory, we're going to drop and then
> > retake the ILOCK_EXCL without trylocks, potentially blocking a task
> > that should not get blocked. That's a separate problem, though, and
> > we probably need to plumb NOWAIT through to the dquot lookup cache
> > miss case to solve that.
> 
> It wouldn't be that hard to turn that second parameter into the usual
> uint flags argument, but I agree that's a separate patch.

*nod*

> How much you wanna bet the FB people have never turned on quota and
> hence have not yet played whackanowait with that subsystem?

No bet, we both know the odds. :/

Indeed, set an extent size hint on a file and then run io_uring
async buffered writes and watch all the massive long tail latencies
that occur on the transaction reservations and btree block IO and
locking in the allocation path....

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx