Re: [RFC 0/1] adapting btrfs/237 to work with the new reclaim algorithm

Johannes Thumshirn <Johannes.Thumshirn@xxxxxxx> · Mon, 5 Dec 2022 07:56:01 +0000

On 19.08.22 13:53, Pankaj Raghav wrote:
> Hi ,
> Since 3687fcb0752a ("btrfs: zoned: make auto-reclaim less aggressive")
> commit, reclaim algorithm has been changed to trigger auto-reclaim once
> the fs used size is more than a certain threshold. This change breaks
> 237 test.
> 
> I tried to adapt the test by doing the following:
> - Write a small file first
> - Write a big file that increases the disk usage to be more than the
>   reclaim threshold
> - Delete the big file to trigger threshold
> - Ensure the small file is relocated and the space used by the big file
>   is reclaimed.
> 
> My test case works properly for small ZNS drives but not for bigger
> sized drives in QEMU. When I use a drive with a size of 100G, not all
> zones that were used by the big file are correctly reclaimed.
> Either I am not setting up the test correctly or there is something
> wrong on how reclaim works for zoned devices.
> 
> I created a simple script to reproduce the scenario instead of running
> the test. Please adapt the $DEV and $big_file_size based on the drive
> size. As I am setting the bg_reclaim_threshold to be 51, $big_file_size
> should be at least 51% of the drive size.
> 
> ```
> DEV=nvme0n3
> DEV_PATH=/dev/$DEV
> big_file_size=2500M
> 
> echo "mq-deadline" > /sys/block/$DEV/queue/scheduler
> umount /mnt/scratch
> blkzone reset $DEV_PATH
> mkfs.btrfs -f -d single -m single $DEV_PATH > /dev/null;  mount -t btrfs $DEV_PATH \
> 	/mnt/scratch
> uuid=$(btrfs fi show $DEV_PATH | grep 'uuid' | awk '{print $NF}')
> 
> echo "51" > /sys/fs/btrfs/$uuid/bg_reclaim_threshold
> 
> fio --filename=/mnt/scratch/test2 --size=1M --rw=write --bs=4k \
> 	--name=btrfs_zoned > /dev/null
> btrfs fi sync /mnt/scratch
> 
> echo "Open zones before big file trasfer:"
> blkzone report $DEV_PATH | grep -v -e em -e nw | wc -l
> 
> fio --filename=/mnt/scratch/test1 --size=$big_file_size --rw=write --bs=4k \
> 	--ioengine=io_uring --name=btrfs_zoned > /dev/null
> btrfs fi sync /mnt/scratch
> 
> echo "Open zones before removing the file:"
> blkzone report $DEV_PATH | grep -v -e em -e nw | wc -l
> rm /mnt/scratch/test1
> btrfs fi sync /mnt/scratch
> 
> echo "Going to sleep. Removed the file"
> sleep 30
> 
> echo "Open zones after reclaim:"
> blkzone report $DEV_PATH | grep -v -e em -e nw | wc -l
> ```
> 
> I am getting the following output in QEMU:
> 
> - 5GB ZNS drive with 128MB zone size (and cap) and it is working as
>   expected:
> 
> ```
> Open zones before big file trasfer:
> 4
> Open zones before removing the file:
> 23
> Going to sleep. Removed the file
> Open zones after reclaim:
> 4
> ```
> 
> - 100GB ZNS drive with 128MB zone size (and cap) and it is **not
>   working** as expected:
> 
> ```
> Open zones before big file trasfer:
> 4
> Open zones before removing the file:
> 455
> Going to sleep. Removed the file
> Open zones after reclaim:
> 411
> ```
> 
> Only partial reclaim is happening for bigger sized drives. The issue
> with that is, if I do another FIO transfer, the drive spits out ENOSPC
> before its actual capacity is reached as most of the zones have not been
> reclaimed back and are basically in an unusable state.
> 
> Is there a limit on how many bgs can be reclaimed?
> 
> Let me know if I am doing something wrong in the test or if it is an
> actual issue.
> 
> Pankaj Raghav (1):
>   btrfs/237: adapt the test to work with the new reclaim algorithm
> 
>  tests/btrfs/237 | 80 +++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 57 insertions(+), 23 deletions(-)
> 

Btw, what ever happend to this patch?