Re: [PATCH RFC] fstests: btrfs: add a tests case to make sure btrfs can handle certain interleaved free space correctly

[Date Prev] [Date Next] [Thread Prev] [Thread Next] [Date Index] [Thread Index]



On Tue, Jul 19, 2022 at 01:16:54PM +0800, Qu Wenruo wrote:
> 
> 
> On 2022/7/19 01:59, Zorro Lang wrote:
> > On Mon, Jul 18, 2022 at 02:18:23PM +0800, Qu Wenruo wrote:
> > > This is a future-proof test mostly for future zoned raid-stripe-tree
> > > (RST) and P/Q COW based RAID56 implementation.
> > > 
> > > Unlike regular devices, zoned device can not do overwrite without
> > > resetting (reclaim) a whole zone.
> > > 
> > > And for the RST and P/Q COW based RAID56, the idea is to CoW the parity
> > > stripe to other location.
> > > 
> > > But all above behaviors introduce some limitation, if we fill the fs,
> > > then free half of the space interleaved.
> > > 
> > > - For basic zoned btrfs (aka SINGLE profile for now)
> > >    Normally this means we have no free space at all.
> > > 
> > >    Thankfully zoned btrfs has GC and reserved zones to reclaim those
> > >    half filled zones.
> > >    In theory we should be able to do new writes.
> > > 
> > > - For future RST with P/Q CoW for RAID56, on non-zoned device.
> > >    This is more complex, in this case, we should have the following
> > >    full stripe layout for every full stripe:
> > >            0                           64K
> > >    Disk A  |XXXXXXXXXXXXXXXXXXXXXXXXXXX| (Data 1)
> > >    Disk B  |                           | (Data 2)
> > >    Disk C  |XXXXXXXXXXXXXXXXXXXXXXXXXXX| (P stripe)
> > > 
> > >    Although in theory we can write into Disk B, but we have to find
> > >    a free space for the new Parity.
> > > 
> > >    But all other full stripe are like this, which means we're deadlocking
> > >    to find a pure free space without sub-stripe writing.
> > > 
> > >    This means, even for non-zoned btrfs, we still need GC and reserved
> > >    space to handle P/Q CoW properly.
> > > 
> > > Another thing specific to this test case is, to reduce the runtime, I
> > > use 256M as the mkfs size for each device.
> > > (A full run with KASAN enabled kernel already takes over 700 seconds)
> > > 
> > > So far this can only works for non-zoned disks, as 256M is too small for
> > > zoned devices to have enough zones.
> > > 
> > > Thus need extra advice from zoned device guys.
> > > 
> > > Cc: Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx>
> > > Signed-off-by: Qu Wenruo <wqu@xxxxxxxx>
> > > ---
> > 
> > I think this patch need more review from btrfs list. I just review this patch
> > from fstests side as below ...
> > 
> > >   tests/btrfs/261     | 129 ++++++++++++++++++++++++++++++++++++++++++++
> > >   tests/btrfs/261.out |   2 +
> > >   2 files changed, 131 insertions(+)
> > >   create mode 100755 tests/btrfs/261
> > >   create mode 100644 tests/btrfs/261.out
> > > 
> > > diff --git a/tests/btrfs/261 b/tests/btrfs/261
> > > new file mode 100755
> > > index 00000000..01da4759
> > > --- /dev/null
> > > +++ b/tests/btrfs/261
> > > @@ -0,0 +1,129 @@
> > > +#! /bin/bash
> > > +# SPDX-License-Identifier: GPL-2.0
> > > +# Copyright (C) 2022 SUSE Linux Products GmbH. All Rights Reserved.
> > > +#
> > > +# FS QA Test 261
> > > +#
> > > +# Make sure all supported profiles (including future zoned RAID56) have proper
> > > +# way to handle fs with interleaved filled space, and can still write data
> > > +# into the fs.
> > > +#
> > > +# This is mostly inspired by some discussion on P/Q COW for RAID56, even for
> > > +# regular devices, this can be problematic if we fill the fs then delete
> > > +# half of the extents interleavedly. Without proper GC and extra reserved
> > > +# space, such CoW P/Q way should run out of space (even one data stripe is
> > > +# free, there is no place to CoW its P/Q).
> > > +#
> > > +. ./common/preamble
> > > +_begin_fstest auto enospc raid
> > > +
> > > +# Override the default cleanup function.
> > > +# _cleanup()
> > > +# {
> > > +# 	cd /
> > > +# 	rm -r -f $tmp.*
> > > +# }
> > 
> > This _cleanup looks like nothing special, you can remove it, to use the default
> > one.
> 
> It's still commented out, just from the template.
> 
> Or you mean I should delete the unused cleanup function if we don't need?

Sure, please remove those useless comments from template, they're just reminder.

> 
> > 
> > > +
> > > +# Import common functions.
> > > +. ./common/filter
> > > +
> > > +# real QA test starts here
> > > +
> > > +# Modify as appropriate.
> >       ^^^
> >     Remove this line please.
> > 
> > > +_supported_fs btrfs
> > > +# we check scratch dev after each loop
> > > +_require_scratch_nocheck
> > > +_require_scratch_dev_pool 4
> > > +
> > > +fill_fs()
> > 
> > There's a help named _fill_fs() in common/populate file. I'm not sure if there
> > are special things in your fill_fs function, better to check if our common
> > helper can help you?
> 
> The fill fs here is to make sure we fill the fs in a specific way
> (always fill the fs using 128KiB, while still being able to delete 64KiB).
> 
> I'll add a comment for the reason.

Sure, if the comment _fill_fs can't help you, feel free to have your own
one (and better to explain what's special in it).

> 
> > 
> > > +{
> > > +	for (( i = 0;; i += 2 )); do
> > > +		$XFS_IO_PROG -f -c "pwrite 0 64K" $SCRATCH_MNT/file_$i \
> > > +			&> /dev/null
> > > +		if [ $? -ne 0 ]; then
> > > +			break
> > > +		fi
> > > +		$XFS_IO_PROG -f -c "pwrite 0 64K" $SCRATCH_MNT/file_$(($i + 1)) \
> > > +			&> /dev/null
> > > +		if [ $? -ne 0 ]; then
> > > +			break
> > > +		fi
> > > +
> > > +		# Only sync after data 1M writes.
> > > +		if [ $(( $i % 8)) -eq 0 ]; then
> > > +			sync
> > > +		fi
> > > +	done
> > > +
> > > +	# Sync what hasn't yet synced.
> > > +	sync
> > > +
> > > +	echo "fs filled with $i full stripe write" >> $seqres.full
> > > +
> > > +	# Delete half of the files created above, which should leave
> > > +	# the fs half empty. For RAID56 this would leave all of its full
> > > +	# stripes to be have one full data stripe, one free data stripe,
> > > +	# and one P/Q stripe still in use.
> > > +	rm -rf -- $SCRATCH_MNT/file_*[02468]
> > > +
> > > +	# Sync to make sure above deleted files really got freed.
> > > +	sync
> > > +}
> > > +
> > > +run_test()
> > > +{
> > > +	local profile=$1
> > > +	local nr_dev=$2
> > > +
> > > +	echo "=== profile=$profile nr_dev=$nr_dev ===" >> $seqres.full
> > > +	_scratch_dev_pool_get $nr_dev
> > > +	# -b is for each device.
> > > +	# Here we use 256M to reduce the runtime.
> > > +	_scratch_pool_mkfs -b 256M -m$profile -d$profile >>$seqres.full 2>&1
> > 
> > Do you need to make sure this mkfs successed at here?
> 
> Yes.
> 
> > 
> > > +	# make sure we created btrfs with desired options
> > > +	if [ $? -ne 0 ]; then
> > > +		echo "mkfs $mkfs_opts failed"
> > > +		return
> > > +	fi
> > > +	_scratch_mount >>$seqres.full 2>&1
> > 
> > If _scratch_mount fails, the testing will exit directly. So generally we don't
> > need to fill out stdout/stderr. Or you actually want to use _try_scratch_mount
> > at here?
> 
> _scratch_mount is exactly what I need, I'll just remove the unnecessary
> redirection.

OK

> 
> > 
> > > +
> > > +	fill_fs
> > > +
> > > +	# Now try to write 4M data, with the fs half empty we should be
> > > +	# able to do that.
> > > +	# For zoned devices, this will test if the GC and reserved zones
> > > +	# can handle such cases properly.
> > > +	$XFS_IO_PROG -f -c "pwrite 0 4M" -c sync $SCRATCH_MNT/final_write \
> > > +		>> $seqres.full 2>&1
> > > +	if [ $? -ne 0 ]; then
> > > +		echo "The final write failed"
> > > +	fi
> > > +
> > > +	_scratch_unmount
> > > +	# we called _require_scratch_nocheck instead of _require_scratch
> > > +	# do check after test for each profile config
> > > +	_check_scratch_fs
> > > +	echo  >> $seqres.full
> > > +	_scratch_dev_pool_put
> > > +}
> > > +
> > > +# Here we don't use _btrfs_profile_configs as that doesn't include
> > > +# the number of devices, but for full stripe writes for RAID56, we
> > > +# need to ensure nr_data must be 2, so here we manually specify
> > > +# the profile and number of devices.
> > > +run_test "single" "1"
> > > +
> > > +# Zoned only support
> > > +if _scratch_btrfs_is_zoned; then
> > > +	exit
> > 
> > I think this "exit" will fail this test directly, due to status=1 currectly.
> > You can use _require_non_zoned_device() to run this case for non-zoned device
> > only. Or
> > 
> >    if ! _scratch_btrfs_is_zoned;then
> > 	run_test "raid0" "2"
> > 	run_test "raid1" "2"
> > 	run_test "raid10" "4"
> > 	run_test "raid5" "3"
> > 	run_test "raid6" "4"
> >    fi
> > 
> > As this case is "Silence is golden".
> > 
> > I'm not sure what do you really need at here, can these help?
> 
> My bad, I forgot to finish the comment, and your example is perfect.
> 
> Thanks for the review.

Np. But before I merge your patch, we still need a RVB from btrfs list at least,
even if I give it my RVB, due to this case is marked as btrfs RFC.

Thanks,
Zorro

> Qu
> 
> > 
> > Thanks,
> > Zorro
> > 
> > > +fi
> > > +
> > > +run_test "raid0" "2"
> > > +run_test "raid1" "2"
> > > +run_test "raid10" "4"
> > > +run_test "raid5" "3"
> > > +run_test "raid6" "4"
> > > +
> > > +echo "Silence is golden"
> > > +status=0
> > > +exit
> > > diff --git a/tests/btrfs/261.out b/tests/btrfs/261.out
> > > new file mode 100644
> > > index 00000000..679ddc0f
> > > --- /dev/null
> > > +++ b/tests/btrfs/261.out
> > > @@ -0,0 +1,2 @@
> > > +QA output created by 261
> > > +Silence is golden
> > > --
> > > 2.36.1
> > > 
> > 
> 




[Index of Archives]     [Linux Filesystems Development]     [Linux NFS]     [Linux NILFS]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux