Re: [PATCH] btrfs/280: run defrag after creating file to get expected extent layout

Qu Wenruo <wqu@xxxxxxxx> · Thu, 6 Jun 2024 10:22:21 +0930

在 2024/6/6 08:47, Filipe Manana 写道:
On Wed, Jun 5, 2024 at 11:30 PM Qu Wenruo <quwenruo.btrfs@xxxxxxx> wrote:



在 2024/6/5 20:56, fdmanana@xxxxxxxxxx 写道:
From: Filipe Manana <fdmanana@xxxxxxxx>

The test writes a 128M file and expects to end up with 1024 extents, each
with a size of 128K, which is the maximum size for compressed extents.
Generally this is what happens, but often it's possibly for writeback to
kick in while creating the file (due to memory pressure, or something
calling sync in parallel, etc) which may result in creating more and
smaller extents, which makes the test fail since its golden output
expects exactly 1024 extents with a size of 128K each.

So to work around run defrag after creating the file, which will ensure
we get only 128K extents in the file.

But defrag is not much different than reading the page and set it dirty
for writeback again.

It can be affected by the same memory pressure things to get split.

Defrag locks the range, the pages, then dirties the pages and then
unlocks the pages. So any writeback attempt happening in parallel will
wait for the pages
to be unlocked. So we shouldn't get extents smaller than 128K. Did I
miss anything?


You're right, I forgot the page is also locked, and the defrag cluster 
size is 256K, exactly aligned with compression extent size.

So it's completely fine.


I guess you choose compressed file extents is to bump up the subvolume
tree meanwhile also compatible for all sector sizes.

Yes, and to be fast and use very little space.


In that case, what about doing DIO using sectorsize of the fs?
So that each dio write would result one file extent item, meanwhile
since it's a single sector/page, memory pressure will never be able to
writeback that sector halfway.

I thought about DIO, but would have to leave holes between every
extent (and for that I would rather use buffered IO for simplicity and
probably faster).
Otherwise fiemap merges all adjacent extents, you get one 8M extent
reported, covering the range of the odd single profile data block group created
by mkfs, and another one for the remaining of the file - it's just
ugly and hard to reason about, plus that could break one day if we
ever get rid of that 8M data block group.

Yep, fiemap merging is another problem.

So this looks totally fine now.

Reviewed-by: Qu Wenruo <wqu@xxxxxxxx>

Thanks,
Qu




Thanks,
Qu

Signed-off-by: Filipe Manana <fdmanana@xxxxxxxx>
---
   tests/btrfs/280 | 10 +++++++++-
   1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/tests/btrfs/280 b/tests/btrfs/280
index d4f613ce..0f7f8a37 100755
--- a/tests/btrfs/280
+++ b/tests/btrfs/280
@@ -13,7 +13,7 @@
   # the backref walking code, used by fiemap to determine if an extent is shared.
   #
   . ./common/preamble
-_begin_fstest auto quick compress snapshot fiemap
+_begin_fstest auto quick compress snapshot fiemap defrag

   . ./common/filter
   . ./common/punch # for _filter_fiemap_flags
@@ -36,6 +36,14 @@ _scratch_mount -o compress
   # extent tree (if the root was a leaf, we would have only data references).
   $XFS_IO_PROG -f -c "pwrite -b 1M 0 128M" $SCRATCH_MNT/foo | _filter_xfs_io

+# While writing the file it's possible, but rare, that writeback kicked in due
+# to memory pressure or a concurrent sync call for example, so we may end up
+# with extents smaller than 128K (the maximum size for compressed extents) and
+# therefore make the test expectations fail because we get more extents than
+# what the golden output expects. So run defrag to make sure we get exactly
+# the expected number of 128K extents (1024 extents).
+$BTRFS_UTIL_PROG filesystem defrag "$SCRATCH_MNT/foo" >> $seqres.full
+
   # Create a RW snapshot of the default subvolume.
   _btrfs subvolume snapshot $SCRATCH_MNT $SCRATCH_MNT/snap