On Tue, Nov 29, 2022 at 07:04:50PM +1100, Dave Chinner wrote: > On Mon, Nov 28, 2022 at 10:50:40PM -0800, Darrick J. Wong wrote: > > On Tue, Nov 29, 2022 at 05:31:04PM +1100, Dave Chinner wrote: > > > On Sun, Nov 27, 2022 at 10:36:29AM -0800, Darrick J. Wong wrote: > > > > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > > > > > I've been running near-continuous integration testing of online fsck, > > > > and I've noticed that once a day, one of the ARM VMs will fail the test > > > > with out of order records in the data fork. > > > > > > > > xfs/804 races fsstress with online scrub (aka scan but do not change > > > > anything), so I think this might be a bug in the core xfs code. This > > > > also only seems to trigger if one runs the test for more than ~6 minutes > > > > via TIME_FACTOR=13 or something. > > > > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/tree/tests/xfs/804?h=djwong-wtf > > > ..... > > > > So. Fix this by moving the dqattach_locked call up, and add a comment > > > > about how we must attach the dquots *before* sampling the data/cow fork > > > > contents. > > > > > > > > Fixes: a526c85c2236 ("xfs: move xfs_file_iomap_begin_delay around") # goes further back than this > > > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > > > > --- > > > > fs/xfs/xfs_iomap.c | 12 ++++++++---- > > > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > > > > > diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c > > > > index 1bdd7afc1010..d903f0586490 100644 > > > > --- a/fs/xfs/xfs_iomap.c > > > > +++ b/fs/xfs/xfs_iomap.c > > > > @@ -984,6 +984,14 @@ xfs_buffered_write_iomap_begin( > > > > if (error) > > > > goto out_unlock; > > > > > > > > + /* > > > > + * Attach dquots before we access the data/cow fork mappings, because > > > > + * this function can cycle the ILOCK. > > > > + */ > > > > + error = xfs_qm_dqattach_locked(ip, false); > > > > + if (error) > > > > + goto out_unlock; > > > > + > > > > /* > > > > * Search the data fork first to look up our source mapping. We > > > > * always need the data fork map, as we have to return it to the > > > > @@ -1071,10 +1079,6 @@ xfs_buffered_write_iomap_begin( > > > > allocfork = XFS_COW_FORK; > > > > } > > > > > > > > - error = xfs_qm_dqattach_locked(ip, false); > > > > - if (error) > > > > - goto out_unlock; > > > > - > > > > if (eof && offset + count > XFS_ISIZE(ip)) { > > > > /* > > > > * Determine the initial size of the preallocation. > > > > > > > > > > Why not attached the dquots before we call xfs_ilock_for_iomap()? > > > > I wanted to minimize the number of xfs_ilock calls -- under the scheme > > you outline, xfs_qm_dqattach will lock it once; a dquot cache miss > > will drop and retake it; and then xfs_ilock_for_iomap would take it yet > > again. That's one more ilock song-and-dance than this patch does... > > Ture, but we don't have an extra lock cycle if the dquots are > already attached to the inode - xfs_qm_dqattach() checks for > attached inodes before it takes the ILOCK to attach them. Hence if > we are doing lots of small writes to a file, we only take this extra > lock cycle for the first delalloc reservation that we make, not > every single one.... > > We have to do it this way for anything that runs an actual > transaction (like the direct IO write path we take if an extent size > hint is set) as we can't cycle the ILOCK within a transaction > context, so the code is already optimised for the "dquots already > attached" case.... <nod> In the end, I decided to rewrite the patch to xfs_qm_dqattach at the start of xfs_buffered_write_iomap_begin. I'll send that shortly. > > > That way we can just call xfs_qm_dqattach(ip, false) and just return > > > on failure immediately. That's exactly what we do in the > > > xfs_iomap_write_direct() path, and it avoids the need to mention > > > anything about lock cycling because we just don't care > > > about cycling the ILOCK to read in or allocate dquots before we > > > start the real work that needs to be done... > > > > ...but I guess it's cleaner once you start assuming that dqattach has > > grown its own NOWAIT flag. I'd sorta prefer to commit this corruption > > fix as it is and rearrange dqget with NOWAIT as a separate series since > > Linus has already warned us[1] to get things done sooner than later. > > > > [1] https://lore.kernel.org/lkml/CAHk-=wgUZwX8Sbb8Zvm7FxWVfX6CGuE7x+E16VKoqL7Ok9vv7g@xxxxxxxxxxxxxx/ > > <shrug> > > If that's your concern, then > > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> Thanks! ;) > However, as maintainer I was never concerned about being "too late > in the cycle". I'd just push it into the for next tree with a stable > tag and when it gets merged in a couple of weeks the stable > maintainers should notice it and backport it appropriately > automatically.... <nod> Normally I wouldn't care about timing since it's a bugfix, but I kinda want to get all these sharp ends wrapped up, to minimize the number of fixes that we still have to work on for -rc1+ in January. > For distro backports, merging into the XFS tree is good enough to be > iconsidered upstream as it's pretty much guaranteed to end up in the > mainline tree once it's been merged by the maintainer.... > > > (OTOH it's already 6pm your time so I may very well be done with all > > the quota nowait changes before you wake up :P) > > NOWAIT changes are definitely next cycle stuff :) > > > > Hmmmmm - this means there's a potential problem with IOCB_NOWAIT > > > here - if the dquots are not in memory, we're going to drop and then > > > retake the ILOCK_EXCL without trylocks, potentially blocking a task > > > that should not get blocked. That's a separate problem, though, and > > > we probably need to plumb NOWAIT through to the dquot lookup cache > > > miss case to solve that. > > > > It wouldn't be that hard to turn that second parameter into the usual > > uint flags argument, but I agree that's a separate patch. > > *nod* > > > How much you wanna bet the FB people have never turned on quota and > > hence have not yet played whackanowait with that subsystem? > > No bet, we both know the odds. :/ > > Indeed, set an extent size hint on a file and then run io_uring > async buffered writes and watch all the massive long tail latencies > that occur on the transaction reservations and btree block IO and > locking in the allocation path.... Granted, I wonder what would --D > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx