On Tue, Nov 29, 2022 at 01:05:24PM -0800, Darrick J. Wong wrote: > From: Darrick J. Wong <djwong@xxxxxxxxxx> > > I've been running near-continuous integration testing of online fsck, > and I've noticed that once a day, one of the ARM VMs will fail the test > with out of order records in the data fork. > > xfs/804 races fsstress with online scrub (aka scan but do not change > anything), so I think this might be a bug in the core xfs code. This > also only seems to trigger if one runs the test for more than ~6 minutes > via TIME_FACTOR=13 or something. > https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfstests-dev.git/tree/tests/xfs/804?h=djwong-wtf > > I added a debugging patch to the kernel to check the data fork extents > after taking the ILOCK, before dropping ILOCK, and before and after each > bmapping operation. So far I've narrowed it down to the delalloc code > inserting a record in the wrong place in the iext tree: ..... > > So. Fix this by moving the dqattach_locked call up before we take the > ILOCK, like all the other callers in that file. > > Fixes: a526c85c2236 ("xfs: move xfs_file_iomap_begin_delay around") # goes further back than this > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx> > Reviewed-by: Dave Chinner <dchinner@xxxxxxxxxx> > --- > v2: just do a regular dqattach, and tweak the commit message to make it > clearer if it's dave or me talking All looks good, thanks for doing the updates :) -Dave. -- Dave Chinner david@xxxxxxxxxxxxx