On Fri, Feb 11, 2022 at 08:01:24AM +0800, Qu Wenruo wrote: > > > On 2022/2/11 00:37, Filipe Manana wrote: > > On Tue, Feb 8, 2022 at 12:00 PM Qu Wenruo <wqu@xxxxxxxx> wrote: > > > > > > There is a long existing problem that extent_map::generation is not > > > populated (thus always 0) if its read from disk. > > > > > > This can prevent btrfs autodefrag from working as it relies on > > > extent_map::generation. > > > If it's always 0, then autodefrag will not consider the range as a > > > defrag target. > > > > > > The test case itself will verify the behavior by: > > > > > > - Create a fragmented file > > > By writing backwards with OSYNC > > > This will also queue the file for autodefrag. > > > > > > - Drop all cache > > > Including the extent map cache, meaning later read will > > > all get extent map by reading from on-disk file extent items. > > > > > > - Trigger autodefrag and verify the file layout > > > If defrag works, the new file layout should differ from the original > > > one. > > > > > > The kernel fix is titled: > > > > > > btrfs: populate extent_map::generation when reading from disk > > > > > > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> > > > --- > > > tests/btrfs/259 | 64 +++++++++++++++++++++++++++++++++++++++++++++ > > > tests/btrfs/259.out | 2 ++ > > > 2 files changed, 66 insertions(+) > > > create mode 100755 tests/btrfs/259 > > > create mode 100644 tests/btrfs/259.out > > > > > > diff --git a/tests/btrfs/259 b/tests/btrfs/259 > > > new file mode 100755 > > > index 00000000..577e4ce4 > > > --- /dev/null > > > +++ b/tests/btrfs/259 > > > @@ -0,0 +1,64 @@ > > > +#! /bin/bash > > > +# SPDX-License-Identifier: GPL-2.0 > > > +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved. > > > +# > > > +# FS QA Test 259 > > > +# > > > +# Make sure autodefrag can still defrag the file even their extent maps are > > > +# read from disk > > > +# > > > +. ./common/preamble > > > +_begin_fstest auto quick defrag > > > + > > > +# Override the default cleanup function. > > > +# _cleanup() > > > +# { > > > +# cd / > > > +# rm -r -f $tmp.* > > > +# } > > > + > > > +# Import common functions. > > > +# . ./common/filter > > > + > > > +# real QA test starts here > > > + > > > +# Modify as appropriate. > > > +_supported_fs btrfs > > > +_require_scratch > > > + > > > +# Need 4K sectorsize, as the autodefrag threshold is only 64K, > > > +# thus 64K sectorsize will not work. > > > +_require_btrfs_support_sectorsize 4096 > > > > Missing a: > > > > _require_xfs_io_command fiemap > > > > > +_scratch_mkfs -s 4k >> $seqres.full > > > +_scratch_mount -o datacow,autodefrag > > > + > > > +# Create fragmented write > > > +$XFS_IO_PROG -f -s -c "pwrite 24k 8k" -c "pwrite 16k 8k" \ > > > + -c "pwrite 8k 8k" -c "pwrite 0 8k" \ > > > + "$SCRATCH_MNT/foobar" >> $seqres.full > > > +sync > > > > A comment on why this sync is needed would be good to have. > > It may be confusing to the reader since we were doing synchronous writes before. > > > > > + > > > +echo "=== Before autodefrag ===" >> $seqres.full > > > +$XFS_IO_PROG -c "fiemap -v" "$SCRATCH_MNT/foobar" >> $tmp.before > > > +cat $tmp.before >> $seqres.full > > > + > > > +# Drop the cache (including extent map cache per-inode) > > > +echo 3 > /proc/sys/vm/drop_caches > > > + > > > +# Now trigger autodefrag > > > > A bit more explanation would be useful. > > > > Set the commit interval to 1 second, so that 1 second after the > > remount the transaction kthread runs > > and wakes up the cleaner kthread, which in turn will run autodefrag. > > > > > +_scratch_remount commit=1 > > > +sleep 3 > > > +sync > > > > This sync is useless, so it should go away. > > Autodefrag doesn't write data back at all. > It just mark the target range dirty and wait for later writeback. > > Thus sync is still needed AFAIK. Well, something weird is going on then. Removing the sync, makes the test still fail on an unpatched kernel, and on a patched kernel, it still succeeds. Looking at the .full file, the results are correct: === Before autodefrag === /home/fdmanana/btrfs-tests/scratch_1/foobar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..15]: 26672..26687 16 0x0 1: [16..31]: 26656..26671 16 0x0 2: [32..47]: 26640..26655 16 0x0 3: [48..63]: 26624..26639 16 0x1 === After autodefrag === /home/fdmanana/btrfs-tests/scratch_1/foobar: EXT: FILE-OFFSET BLOCK-RANGE TOTAL FLAGS 0: [0..63]: 26688..26751 64 0x1 So why this worked seems to be because: - After doing the writes to the file, a delayed iput on the inode did a wakeup on the cleaner kthread; - The cleaner kthread ran defrag; - btrfs_remount() calls sync_filesystem(), flushing all delalloc. So, yes, it's better to leave the 'sync', as there's no way to guarantee the cleaner ran defrag before remount. Perhaps a comment about that sync would be useful as well. Thanks. > > Thanks, > Qu > > > > > Otherwise, it looks good and the test works as expected. > > > > Thanks for doing it. > > > > > + > > > +echo "=== After autodefrag ===" >> $seqres.full > > > +$XFS_IO_PROG -c "fiemap -v" "$SCRATCH_MNT/foobar" >> $tmp.after > > > +cat $tmp.after >> $seqres.full > > > + > > > +# The layout should differ if autodefrag is working > > > +diff $tmp.before $tmp.after > /dev/null && echo "autodefrag didn't defrag the file" > > > + > > > +echo "Silence is golden" > > > + > > > +# success, all done > > > +status=0 > > > +exit > > > diff --git a/tests/btrfs/259.out b/tests/btrfs/259.out > > > new file mode 100644 > > > index 00000000..bfbd2dea > > > --- /dev/null > > > +++ b/tests/btrfs/259.out > > > @@ -0,0 +1,2 @@ > > > +QA output created by 259 > > > +Silence is golden > > > -- > > > 2.34.1 > > >