Re: [PATCH 05/12] generic/562: handle ENOSPC while cloning gracefully

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Sun, 24 Nov 2024 21:18:54 -0800

On Mon, Nov 18, 2024 at 04:31:31PM -0800, Darrick J. Wong wrote:
> On Tue, Nov 19, 2024 at 12:17:56AM +0000, Filipe Manana wrote:
> > On Mon, Nov 18, 2024 at 11:03 PM Darrick J. Wong <djwong@xxxxxxxxxx> wrote:
> > >
> > > From: Darrick J. Wong <djwong@xxxxxxxxxx>
> > >
> > > This test creates a couple of patterned files on a tiny filesystem,
> > > fragments the free space, clones one patterned file to the other, and
> > > checks that the entire file was cloned.
> > >
> > > However, this test doesn't work on a 64k fsblock filesystem because
> > > we've used up all the free space reservation for the rmapbt, and that
> > > causes the FICLONE to error out with ENOSPC partway through.  Hence we
> > > need to detect the ENOSPC and _notrun the test.
> > >
> > > That said, it turns out that XFS has been silently dropping error codes
> > > if we managed to make some progress cloning extents.  That's ok if the
> > > operation has REMAP_FILE_CAN_SHORTEN like copy_file_range does, but
> > > FICLONE/FICLONERANGE do not permit partial results, so the dropped error
> > > codes is actually an error.
> > >
> > > Therefore, this testcase now becomes a regression test for the patch to
> > > fix that.
> > >
> > > Signed-off-by: Darrick J. Wong <djwong@xxxxxxxxxx>
> > > ---
> > >  tests/generic/562 |   10 ++++++++--
> > >  1 file changed, 8 insertions(+), 2 deletions(-)
> > >
> > >
> > > diff --git a/tests/generic/562 b/tests/generic/562
> > > index 91360c4154a6a2..62899945003513 100755
> > > --- a/tests/generic/562
> > > +++ b/tests/generic/562
> > > @@ -15,6 +15,9 @@ _begin_fstest auto clone punch
> > >  . ./common/filter
> > >  . ./common/reflink
> > >
> > > +test "$FSTYP" = "xfs" && \
> > > +       _fixed_by_kernel_commit XXXXXXXXXX "xfs: don't drop errno values when we fail to ficlone the entire range"
> > > +
> > >  _require_scratch_reflink
> > >  _require_test_program "punch-alternating"
> > >  _require_xfs_io_command "fpunch"
> > > @@ -48,8 +51,11 @@ while true; do
> > >  done
> > >
> > >  # Now clone file bar into file foo. This is supposed to succeed and not fail
> > > -# with ENOSPC for example.
> > > -_reflink $SCRATCH_MNT/bar $SCRATCH_MNT/foo >>$seqres.full
> > > +# with ENOSPC for example.  However, XFS will sometimes run out of space.
> > > +_reflink $SCRATCH_MNT/bar $SCRATCH_MNT/foo >>$seqres.full 2> $tmp.err
> > > +cat $tmp.err
> > > +grep -q 'No space left on device' $tmp.err && \
> > > +       _notrun "ran out of space while cloning"
> > 
> > This defeats the original purpose of the test, which was to verify
> > btrfs didn't fail with -ENOSPC (or any other error).
> > 
> > If XFS has an ENOSPC issue in some cases, and it's not fixable, why
> > not make it _notrun only if it's XFS that is being tested?
> 
> Ok, will do.  In the case of xfs, we don't let you share data if the
> allocation group it's in is more than about 90% full.

Er... scratch that response :)

It's true that XFS doesn't let you share data if the AG is more than 90%
full, but this test exposed a data corruption vector in 4.20-6.12 if you
did run out of space.  Granted you'd only hit that if you had 64k block
size and a fairly small disk (~33GB) but still, that's reason enough to
keep running on XFS even if it runs out of space.  ENOSPC is better than
data corruption.

--D

> --D
> 
> > Thanks.
> > 
> > 
> > >
> > >  # Unmount and mount the filesystem again to verify the operation was durably
> > >  # persisted.
> > >
> > >
> > 
>