Re: [PATCH] generic: test cloning large exents to a file with many small extents

"Darrick J. Wong" <darrick.wong@xxxxxxxxxx> · Thu, 27 Jun 2019 20:32:22 -0700

On Thu, Jun 27, 2019 at 10:47:31PM +0100, Filipe Manana wrote:
> On Thu, Jun 27, 2019 at 9:28 PM Darrick J. Wong <darrick.wong@xxxxxxxxxx> wrote:
> >
> > On Thu, Jun 27, 2019 at 06:00:30PM +0100, fdmanana@xxxxxxxxxx wrote:
> > > From: Filipe Manana <fdmanana@xxxxxxxx>
> > >
> > > Test that if we clone a file with some large extents into a file that has
> > > many small extents, when the fs is nearly full, the clone operation does
> > > not fail and produces the correct result.
> > >
> > > This is motivated by a bug found in btrfs wich is fixed by the following
> > > patches for the linux kernel:
> > >
> > >  [PATCH 1/2] Btrfs: factor out extent dropping code from hole punch handler
> > >  [PATCH 2/2] Btrfs: fix ENOSPC errors, leading to transaction aborts, when
> > >              cloning extents
> > >
> > > The test currently passes on xfs.
> > >
> > > Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
> > > ---
> > >  tests/generic/558     | 75 +++++++++++++++++++++++++++++++++++++++++++++++++++
> > >  tests/generic/558.out |  5 ++++
> > >  tests/generic/group   |  1 +
> > >  3 files changed, 81 insertions(+)
> > >  create mode 100755 tests/generic/558
> > >  create mode 100644 tests/generic/558.out
> > >
> > > diff --git a/tests/generic/558 b/tests/generic/558
> > > new file mode 100755
> > > index 00000000..ee16cdf7
> > > --- /dev/null
> > > +++ b/tests/generic/558
> > > @@ -0,0 +1,75 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (C) 2019 SUSE Linux Products GmbH. All Rights Reserved.
> > > +#
> > > +# FSQA Test No. 558
> > > +#
> > > +# Test that if we clone a file with some large extents into a file that has
> > > +# many small extents, when the fs is nearly full, the clone operation does
> > > +# not fail and produces the correct result.
> > > +#
> > > +seq=`basename $0`
> > > +seqres=$RESULT_DIR/$seq
> > > +echo "QA output created by $seq"
> > > +tmp=/tmp/$$
> > > +status=1     # failure is the default!
> > > +trap "_cleanup; exit \$status" 0 1 2 3 15
> > > +
> > > +_cleanup()
> > > +{
> > > +     cd /
> > > +     rm -f $tmp.*
> > > +}
> > > +
> > > +# get standard environment, filters and checks
> > > +. ./common/rc
> > > +. ./common/filter
> > > +. ./common/reflink
> > > +
> > > +# real QA test starts here
> > > +_supported_fs generic
> > > +_supported_os Linux
> > > +_require_scratch_reflink
> > > +
> > > +rm -f $seqres.full
> > > +
> > > +_scratch_mkfs_sized $((512 * 1024 * 1024)) >>$seqres.full 2>&1
> > > +_scratch_mount
> > > +
> > > +file_size=$(( 128 * 1024 * 1024 )) # 128Mb
> > > +extent_size=4096
> >
> > What if the fs block size is 64k?
> 
> Then we get extents of 64Kb instead of 4Kb. Works on btrfs. Is it a
> problem for xfs (or any other fs)?

It shouldn't be; I was merely wondering if 2048 extents was enough to
trigger the enospc on btrfs.

> >
> > > +num_extents=$(( $file_size / $extent_size ))
> > > +
> > > +# Create a file with many small extents.
> > > +for ((i = 0; i < $num_extents; i++)); do
> > > +     offset=$(( $i * $extent_size ))
> > > +     $XFS_IO_PROG -f -s -c "pwrite -S 0xe5 $offset $extent_size" \
> > > +             $SCRATCH_MNT/foo >>/dev/null
> > > +done
> >
> > I wouldn't have thought that this would actually succeed on xfs because
> > you could lay extents down one after the other, but then started seeing
> > this in the filefrag output:
> >
> > File size of /opt/foo is 528384 (129 blocks of 4096 bytes)
> >  ext:     logical_offset:        physical_offset: length:   expected: flags:
> >    3:       17..      17:         52..        52:      1:         37:
> >    4:       18..      18:         67..        67:      1:         53:
> >    5:       19..      19:         81..        81:      1:         68:
> >    6:       20..      20:         94..        94:      1:         82:
> >    7:       21..      21:        106..       106:      1:         95:
> >    8:       22..      22:        117..       117:      1:        107:
> >    9:       23..      23:        127..       127:      1:        118:
> >   10:       24..      24:        136..       136:      1:        128:
> >   11:       25..      25:        144..       144:      1:        137:
> >   12:       26..      26:        151..       151:      1:        145:
> >   13:       27..      27:        157..       157:      1:        152:
> >   14:       28..      28:        162..       162:      1:        158:
> >   15:       29..      29:        166..       166:      1:        163:
> >   16:       30..      30:        169..       169:      1:        167:
> >   17:       31..      32:        171..       172:      2:        170:
> >   18:       33..      33:        188..       188:      1:        173:
> >
> > 52, 67, 81, 94, 106, 117, 127, 136, 44, 151, 157, 162, 166, 169, 171...
> >
> >  +15 +14 +13 +12  +11  +10   +9   +8  +7   +6   +5   +4   +3   +2...
> >
> > Hm, I wonder what quirk of the xfs allocator this might be?
> 
> Don't ask me, I have no idea how it works :)

Ah, it's because xfs frees post-eof allocations every time we close a
file... and I never got the series that fixes this finalized and sent
upstream.

> >
> > > +
> > > +# Create file bar with the same size that file foo has but with large extents.
> > > +$XFS_IO_PROG -f -c "pwrite -S 0xc7 -b $file_size 0 $file_size" \
> > > +     $SCRATCH_MNT/bar >>/dev/null
> > > +
> > > +# Fill the fs (for btrfs we are interested in filling all unallocated space
> > > +# and most of the existing metadata block group(s), so that after this there
> > > +# will be no unallocated space and metadata space will be mostly full but with
> > > +# more than enough free space for the clone operation below to succeed).
> > > +i=1
> > > +while true; do
> > > +     $XFS_IO_PROG -f -c "pwrite 0 2K" $SCRATCH_MNT/filler_$i &> /dev/null
> > > +     [ $? -ne 0 ] && break
> > > +     i=$(( i + 1 ))
> > > +done
> >
> > _fill_fs?
> 
> No, because that will do fallocate. Writing 2Kb files here is on
> purpose because that will fill the metadata
> block group(s) (on btrfs we inline extents in the btree (metadata) as
> long as they are not bigger than 2Kb, by default).
> Using fallocate, we end up using data block groups.

Ok.

> >
> > > +
> > > +# Now clone file bar into file foo. This is supposed to succeed and not fail
> > > +# with ENOSPC for example.
> > > +$XFS_IO_PROG -c "reflink $SCRATCH_MNT/bar" $SCRATCH_MNT/foo >>/dev/null
> >
> > _reflink $SCRATCH_MNT/bar $SCRATCH_MNT/foo ?
> 
> Does exactly the same. I don't mind changing it however.

<Nod>

--D

> Thanks.
> 
> >
> > --D
> >
> > > +
> > > +_scratch_remount
> > > +
> > > +echo "File foo data after cloning and remount:"
> > > +od -A d -t x1 $SCRATCH_MNT/foo
> > > +
> > > +status=0
> > > +exit
> > > diff --git a/tests/generic/558.out b/tests/generic/558.out
> > > new file mode 100644
> > > index 00000000..d1e8e70f
> > > --- /dev/null
> > > +++ b/tests/generic/558.out
> > > @@ -0,0 +1,5 @@
> > > +QA output created by 558
> > > +File foo data after cloning and remount:
> > > +0000000 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7 c7
> > > +*
> > > +134217728
> > > diff --git a/tests/generic/group b/tests/generic/group
> > > index 543c0627..c06c1cd1 100644
> > > --- a/tests/generic/group
> > > +++ b/tests/generic/group
> > > @@ -560,3 +560,4 @@
> > >  555 auto quick cap
> > >  556 auto quick casefold
> > >  557 auto quick log
> > > +558 auto clone
> > > --
> > > 2.11.0
> > >