On Mon, Oct 25, 2021 at 09:00:53AM -0400, Brian Foster wrote: > Test that COW writeback that overlaps non-shared delalloc blocks > does not leave around stale delalloc blocks on I/O failure. This > triggers assert failures and free space accounting corruption on > XFS. > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > --- > > v2: > - Explicitly set COW extent size hint. > - Move to tests/xfs. > - Various minor cleanups. > v1: https://lore.kernel.org/fstests/20211021163959.1887011-1-bfoster@xxxxxxxxxx/ > > tests/xfs/999 | 62 +++++++++++++++++++++++++++++++++++++++++++++++ > tests/xfs/999.out | 2 ++ > 2 files changed, 64 insertions(+) > create mode 100755 tests/xfs/999 > create mode 100644 tests/xfs/999.out > > diff --git a/tests/xfs/999 b/tests/xfs/999 > new file mode 100755 > index 00000000..f27972bc > --- /dev/null > +++ b/tests/xfs/999 > @@ -0,0 +1,62 @@ > +#! /bin/bash > +# SPDX-License-Identifier: GPL-2.0 > +# Copyright (c) 2021 Red Hat, Inc. All Rights Reserved. > +# > +# FS QA Test 999 > +# > +# Test that COW writeback that overlaps non-shared delalloc blocks does not > +# leave around stale delalloc blocks on I/O failure. This triggers assert > +# failures and free space accounting corruption on XFS. > +# > +. ./common/preamble > +_begin_fstest auto quick clone > + > +_cleanup() > +{ > + _cleanup_flakey > + cd / > + rm -r -f $tmp.* > +} > + > +# Import common functions. > +. ./common/reflink > +. ./common/dmflakey > + > +# real QA test starts here > +_supported_fs xfs > +_require_scratch_reflink > +_require_cp_reflink > +_require_xfs_io_command "cowextsize" > +_require_flakey_with_error_writes > + > +_scratch_mkfs >> $seqres.full > +_init_flakey > +_mount_flakey > + > +blksz=$(_get_file_block_size $SCRATCH_MNT) > + > +# Set the COW extent size hint to guarantee COW fork preallocation occurs over a > +# bordering block offset. > +$XFS_IO_PROG -c "cowextsize $((blksz * 2))" $SCRATCH_MNT >> $seqres.full > + > +# create two files that share a single block > +$XFS_IO_PROG -fc "pwrite $blksz $blksz" $SCRATCH_MNT/file1 >> $seqres.full > +_cp_reflink $SCRATCH_MNT/file1 $SCRATCH_MNT/file2 > + > +# Perform a buffered write across the shared and non-shared blocks. On XFS, this > +# creates a COW fork extent that covers the shared block as well as the just > +# created non-shared delalloc block. Fail the writeback to verify that all > +# delayed allocation is cleaned up properly. > +_load_flakey_table $FLAKEY_ERROR_WRITES > +$XFS_IO_PROG -c "pwrite 0 $((blksz * 2))" \ > + -c fsync $SCRATCH_MNT/file2 >> $seqres.full > +_load_flakey_table $FLAKEY_ALLOW_WRITES Hmm. So I've been running this test in my djwong-dev tree and hit this last night: --- xfs/999.out +++ xfs/999.out.bad @@ -1,2 +1,3 @@ QA output created by 999 -fsync: Input/output error +stat: Input/output error +cp: failed to access '/opt/file3': Input/output error Digging into the kernel log, I see this happen: [10240.821719] XFS (dm-0): Mounting V5 Filesystem [10240.855461] XFS (dm-0): Ending clean mount [10240.857030] XFS (dm-0): Quotacheck needed: Please wait. [10240.860095] XFS (dm-0): Quotacheck: Done. [10240.977055] XFS (dm-0): log I/O error -5 [10240.977459] XFS (dm-0): Log I/O Error (0x2) detected at xlog_ioend_work+0x5f/0xb0 [xfs] (fs/xfs/xfs_log.c:1377). Shutting down filesystem. [10240.978682] XFS (dm-0): Please unmount the filesystem and rectify the problem(s) [10241.044886] XFS (dm-0): Unmounting Filesystem I guess the log tried to checkpoint for the brief window where the flakey table was enabled, and shut down the whole fs? I don't have any good ideas for how to solve this, though. Hm. What if you did something like: $XFS_IO_PROG -c 'pwrite...' $SCRATCH_MNT/file2 _load_flakey_table $FLAKEY_ERROR_WRITES $XFS_IO_PROG -c 'sync_range -wa' $SCRATCH_MNT/file2 +_load_flakey_table $FLAKEY_ALLOW_WRITES to constrain the window in which disk write will fail? Seeing as s_f_r doesn't actually tell the fs to flush its own metadata or anything. (Yikes, did I finally find a use for sync_file_range??) --D > + > +# Try a post-fail reflink and then unmount. Both of these are known to produce > +# errors and/or assert failures on XFS if we trip over a stale delalloc block. > +_cp_reflink $SCRATCH_MNT/file2 $SCRATCH_MNT/file3 > +_unmount_flakey > + > +# success, all done > +status=0 > +exit > diff --git a/tests/xfs/999.out b/tests/xfs/999.out > new file mode 100644 > index 00000000..88b69c4c > --- /dev/null > +++ b/tests/xfs/999.out > @@ -0,0 +1,2 @@ > +QA output created by 999 > +fsync: Input/output error > -- > 2.31.1 >