Re: [BUG?] bcachefs: keep writing to device when there is no high-level I/O activity.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 27, 2024 at 05:49:33PM GMT, David Wang wrote:
> Hi,
> 
> I was using two partitions on same nvme device to compare filesystem performance,
> and I consistantly observed a strange behavior:
> 
> After 10 minutes fio test with bcachefs on one partition, performance degrade
> significantly for other filesystems on other partition (same device).
> 
> 	ext4  150M/s --> 143M/s
> 	xfs   150M/s --> 134M/s
> 	btrfs 127M/s --> 108M/s
> 
> Several round tests show the same pattern that bcachefs seems occupy some device resource
> even when there is no high-level I/O.

This is is a known issue, it should be either journal reclaim or
rebalance.

(We could use some better stats to see exactly which it is)

The algorithm for how we do background work needs to change; I've
written up a new one but I'm a ways off from having time to implement it

https://evilpiepirate.org/git/bcachefs.git/commit/?h=bcachefs-garbage&id=47a4b574fb420aa824aad222436f4c294daf66ae

Could be a fun one for someone new to take on.

> 
> I monitor /proc/diskstats, and it confirmed that bcachefs do keep writing the device.
> Following is the time serial samples for "writes_completed" on my bcachefs partition:
> 
> writes_completed @timestamp
> 	       0 @1724748233.712
> 	       4 @1724748248.712    <--- mkfs
> 	       4 @1724748263.712
> 	      65 @1724748278.712
> 	   25350 @1724748293.712
> 	   63839 @1724748308.712    <--- fio started
>   	  352228 @1724748323.712
> 	  621350 @1724748338.712
> 	  903487 @1724748353.712
>         ...
> 	12790311 @1724748863.712
> 	13100041 @1724748878.712
> 	13419642 @1724748893.712
> 	13701685 @1724748908.712    <--- fio done (10minutes)
> 	13701769 @1724748923.712    <--- from here, average 5~7writes/second for 2000 seconds
> 	13701852 @1724748938.712
> 	13701953 @1724748953.712
> 	13702032 @1724748968.712
> 	13702133 @1724748983.712
> 	13702213 @1724748998.712
> 	13702265 @1724749013.712
> 	13702357 @1724749028.712
>         ...
> 	13712984 @1724750858.712
> 	13713076 @1724750873.712
> 	13713196 @1724750888.712
> 	13713299 @1724750903.712
> 	13713386 @1724750918.712
> 	13713463 @1724750933.712
> 	13713501 @1724750948.712   <--- writes stopped here
> 	13713501 @1724750963.712
> 	13713501 @1724750978.712
> 	...
> 
> Is this behavior expected? 
> 
> My test script:
> 	set -e
> 	for fsa in "btrfs" "ext4" "bcachefs" "xfs"
> 	do
> 		if [ $fsa == 'ext4' ]; then
> 			mkfs -t ext4 -F /dev/nvme0n1p1
> 		else
> 			mkfs -t $fsa -f /dev/nvme0n1p1
> 		fi
> 		mount -t $fsa /dev/nvme0n1p1 /disk02/dir1
> 		for fsb in "ext4" "bcachefs" "xfs" "btrfs"
> 		do
> 			if [ $fsb == 'ext4' ]; then
> 				mkfs -t ext4 -F /dev/nvme0n1p2
> 			else
> 				mkfs -t $fsb -f /dev/nvme0n1p2
> 			fi
> 			mount -t $fsb /dev/nvme0n1p2 /disk02/dir2
> 
> 			cd /disk02/dir1 && fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test  --bs=4k --iodepth=64 --size=1G --readwrite=randrw  --runtime=600 --numjobs=8 --time_based=1 --output=/disk02/fio.${fsa}.${fsb}.0
> 			sleep 30
> 			cd /disk02/dir2 && fio --randrepeat=1 --ioengine=libaio --direct=1 --name=test  --bs=4k --iodepth=64 --size=1G --readwrite=randrw  --runtime=600 --numjobs=8 --time_based=1 --output=/disk02/fio.${fsa}.${fsb}.1
> 			sleep 30
> 			cd /disk02
> 			umount /disk02/dir2
> 		done
> 		umount /disk02/dir1
> 	done
> 
> And here is a report for one round of test matrix:
> +----------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+
> |   R|W    |             ext4            |           bcachefs          |             xfs             |            btrfs            |
> +----------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+
> |   ext4   |    [ext4]147MB/s|147MB/s    |    [ext4]146MB/s|146MB/s    |    [ext4]150MB/s|150MB/s    |    [ext4]149MB/s|149MB/s    |
> |          |    [ext4]146MB/s|146MB/s    | [bcachefs]72.2MB/s|72.2MB/s |     [xfs]149MB/s|149MB/s    |    [btrfs]132MB/s|132MB/s   |
> | bcachefs | [bcachefs]71.9MB/s|71.9MB/s | [bcachefs]65.1MB/s|65.1MB/s | [bcachefs]69.6MB/s|69.6MB/s | [bcachefs]65.8MB/s|65.8MB/s |
> |          |    [ext4]143MB/s|143MB/s    | [bcachefs]71.5MB/s|71.5MB/s |     [xfs]134MB/s|133MB/s    |    [btrfs]108MB/s|108MB/s   |
> |   xfs    |     [xfs]148MB/s|148MB/s    |     [xfs]147MB/s|147MB/s    |     [xfs]152MB/s|152MB/s    |     [xfs]151MB/s|151MB/s    |
> |          |    [ext4]147MB/s|147MB/s    | [bcachefs]71.3MB/s|71.3MB/s |     [xfs]148MB/s|148MB/s    |    [btrfs]127MB/s|127MB/s   |
> |  btrfs   |    [btrfs]132MB/s|132MB/s   |    [btrfs]112MB/s|111MB/s   |    [btrfs]110MB/s|110MB/s   |    [btrfs]110MB/s|110MB/s   |
> |          |    [ext4]147MB/s|146MB/s    | [bcachefs]69.7MB/s|69.7MB/s |     [xfs]146MB/s|146MB/s    |    [btrfs]125MB/s|125MB/s   |
> +----------+-----------------------------+-----------------------------+-----------------------------+-----------------------------+
> (The rows are for the FS on the first partition, and the cols are on the second partition)
> 
> The version of bcachefs-tools on my system is 1.9.1.
> (The impact is worse, ext4 dropped to 80M/s, when I was using bcachefs-tools from debian repos which is too *old*,
> and known to cause bcachefs problems. And that is the reason that I do this kind of test.)
> 
> 
> Thanks
> David
> 




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux