On 28.03.22 13:28, Thorsten Leemhuis wrote: > Hi Btrfs developers, this is your Linux kernel regression tracker. > Top-posting for once, to make this easily accessible to everyone. > > Are there any known regressions in 5.15 that cause more inode evictions? > There is a user reporting such problems, see the msg quoted below, which > is the last from this thread: > > https://lore.kernel.org/all/MN2PR20MB2512314446801B92562E26B5D2169@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/ > > @Bruno: sorry, you report is very hard to follow at least for me. Maybe > the btrfs developers have a idea what's wrong here, but if not you > likely need to make this easier for all us: > > Write a really simple testcase and run it on those vanilla kernels that > matter, which are in this case: The latest 5.17.y and 5.15.y releases to > check if it was already solved -- and if not, on 5.14 and 5.15. Maybe > you need to bisect the issue to bring us closer the the root of the > problem. But with a bit of luck the btrfs developers might have an idea > already. > > Ciao, Thosten Hi Thorsten, Thanks for the feedback. I tried my best to make it simple this time. I hope it is, at least, better than before. Grazie, Bruno *** Test case about high inode eviction on the 5.15 kernel *** This regression was first observed during rpm operations with specific packages that became A LOT slower to update ranging from 4 to 30 minutes [1]. The symptoms are: high cpu usage, high inode eviction and much slower I/O performance. Analyzing the rpm's strace report and making some experiments I could replace the rpm with a script that do 3 thing: - rename a file, unlink the renamed file and create a new file. This test case is designed to trigger the following regression on the 5.15 kernel: * repeated renameat2, unlink and openat system calls reach files with btrfs compression property enable. * the combination of these system calls and the btrfs compression property triggers the high inode eviction. * the high inode eviction causes to much work for the btrfs directory logging. * the overloaded btrfs directory logging causes the slower I/O performance. A simplified script is supplied. For a more capable script, more information and more test results please refer to my github account [2]. [1] https://bugzilla.opensuse.org/show_bug.cgi?id=1193549 [2] https://github.com/bdamascen0/s3e Index: 1. Vanilla kernels 2. k5.14.21 x k5.15.32 2.1. Detailed test report for compressed files 2.2. Comparison results for compressed files 2.3. Detailed test report for uncompressed files 2.4. Comparison results for uncompressed files 2.5. Cpu usage 3. k5.17.1 and k5.16.15 3.1. Basic test report 4. Simplified test script 1. Vanilla kernels This test case mainly covers the following vanilla kernels [3]: 5.15.32 and 5.14.21. The 5.15.32 vanilla kernel produced abnormal results: * high cpu usage, high inode eviction and much slower I/O performance for compressed files. * double inode eviction and slightly worse I/O performance for uncompressed files. The 5.14.21 vanilla kernel produced normal results and is used as a reference. The 5.17.1 and 5.16.15 vanilla kernels [4] produced normal results which can be found at the end. [3] https://wiki.ubuntu.com/Kernel/MainlineBuilds [4] https://software.opensuse.org/package/kernel-vanilla 2. k5.14.21 x k5.15.32 2.1 Detailed test report for compressed files ubuntu jammy jellyfish -- kernel 5.14.21 --- vanilla --- (kvm) ..updating 50 files on /mnt/inode-ev/zstd: Job took 226 ms @inode_evictions: 51 ..updating 50 files on /mnt/inode-ev/lzo: Job took 222 ms @inode_evictions: 51 ..updating 100 files on /mnt/inode-ev/zstd: Job took 384 ms @inode_evictions: 101 ..updating 100 files on /mnt/inode-ev/lzo: Job took 462 ms @inode_evictions: 101 ..updating 150 files on /mnt/inode-ev/zstd: Job took 493 ms @inode_evictions: 151 ..updating 150 files on /mnt/inode-ev/lzo: Job took 554 ms @inode_evictions: 151 ..updating 200 files on /mnt/inode-ev/zstd: Job took 804 ms @inode_evictions: 201 ..updating 200 files on /mnt/inode-ev/lzo: Job took 725 ms @inode_evictions: 201 ..updating 250 files on /mnt/inode-ev/zstd: Job took 745 ms @inode_evictions: 251 ..updating 250 files on /mnt/inode-ev/lzo: Job took 758 ms @inode_evictions: 251 ..updating 1000 files on /mnt/inode-ev/zstd: Job took 3452 ms @inode_evictions: 1001 ..updating 1000 files on /mnt/inode-ev/lzo: Job took 2979 ms @inode_evictions: 1001 ubuntu jammy jellyfish -- kernel 5.15.32 --- vanilla --- (kvm) ..updating 50 files on /mnt/inode-ev/zstd: Job took 420 ms @inode_evictions: 1275 ..updating 50 files on /mnt/inode-ev/lzo: Job took 488 ms @inode_evictions: 1275 ..updating 100 files on /mnt/inode-ev/zstd: Job took 1649 ms @inode_evictions: 5050 ..updating 100 files on /mnt/inode-ev/lzo: Job took 1566 ms @inode_evictions: 5050 ..updating 150 files on /mnt/inode-ev/zstd: Job took 4448 ms @inode_evictions: 11325 ..updating 150 files on /mnt/inode-ev/lzo: Job took 4136 ms @inode_evictions: 11325 ..updating 200 files on /mnt/inode-ev/zstd: Job took 9177 ms @inode_evictions: 20100 ..updating 200 files on /mnt/inode-ev/lzo: Job took 9070 ms @inode_evictions: 20100 ..updating 250 files on /mnt/inode-ev/zstd: Job took 16191 ms @inode_evictions: 31375 ..updating 250 files on /mnt/inode-ev/lzo: Job took 16062 ms @inode_evictions: 31375 ..updating 1000 files on /mnt/inode-ev/zstd: Job took 132865 ms @inode_evictions: 104195 ..updating 1000 files on /mnt/inode-ev/lzo: Job took 131979 ms @inode_evictions: 106639 2.2. Comparison results for compressed files k5.15.32 vanilla (compared to: k5.14.21 vanilla) 50 files gives aprox. 1.8 x more time and aprox. 25 x more inode evictions 100 files gives aprox. 3.3 x more time and aprox. 50 x more inode evictions 150 files gives aprox. 7.4 x more time and aprox. 75 x more inode evictions 200 files gives aprox. 11.4 x more time and aprox. 100 x more inode evictions 250 files gives aprox. 21.1 x more time and aprox. 125 x more inode evictions 1000 files gives aprox. 38.4 x more time and aprox. 100 x more inode evictions 2.3 Detailed test report for uncompressed files ubuntu jammy jellyfish -- kernel 5.14.21 --- vanilla --- (kvm) ..updating 50 files on /mnt/inode-ev/uncompressed: Job took 214 ms @inode_evictions: 51 ..updating 100 files on /mnt/inode-ev/uncompressed: Job took 402 ms @inode_evictions: 101 ..updating 150 files on /mnt/inode-ev/uncompressed: Job took 543 ms @inode_evictions: 151 ..updating 200 files on /mnt/inode-ev/uncompressed: Job took 694 ms @inode_evictions: 201 ..updating 250 files on /mnt/inode-ev/uncompressed: Job took 835 ms @inode_evictions: 251 ..updating 1000 files on /mnt/inode-ev/uncompressed: Job took 3162 ms @inode_evictions: 1001 ubuntu jammy jellyfish -- kernel 5.15.32 --- vanilla --- (kvm) ..updating 50 files on /mnt/inode-ev/uncompressed: Job took 269 ms @inode_evictions: 99 ..updating 100 files on /mnt/inode-ev/uncompressed: Job took 359 ms @inode_evictions: 199 ..updating 150 files on /mnt/inode-ev/uncompressed: Job took 675 ms @inode_evictions: 299 ..updating 200 files on /mnt/inode-ev/uncompressed: Job took 752 ms @inode_evictions: 399 ..updating 250 files on /mnt/inode-ev/uncompressed: Job took 1149 ms @inode_evictions: 499 ..updating 1000 files on /mnt/inode-ev/uncompressed: Job took 7333 ms @inode_evictions: 1999 2.4. Comparison results for uncompressed files k5.15.32 vanilla (compared to: k5.14.21 vanilla) 50 files gives aprox. 1.2 x more time and aprox. 2 x more inode evictions 100 files gives aprox. 0.8 x more time and aprox. 2 x more inode evictions 150 files gives aprox. 1.2 x more time and aprox. 2 x more inode evictions 200 files gives aprox. 1.0 x more time and aprox. 2 x more inode evictions 250 files gives aprox. 1.3 x more time and aprox. 2 x more inode evictions 1000 files gives aprox. 2.3 x more time and aprox. 2 x more inode evictions 2.5. Cpu usage ubuntu jammy jellyfish -- kernel 5.15.32 --- vanilla --- (kvm) ..updating 1000 files on /mnt/inode-ev/zstd: Job took 132691 ms - real 2m12,731s sys 2m 7,134s ..updating 1000 files on /mnt/inode-ev/lzo: Job took 134130 ms - real 2m14,149s sys 2m 8,447s ..updating 1000 files on /mnt/inode-ev/uncompressed: Job took 7241 ms - real 0m 7,256s sys 0m 4,732s 3 k5.17.1 and k5.16.15 Just for the record, the 5.16 kernel never reproduced the regression. The real life workload to trigger the regression, the rpm package updates, were verified to work fine since 5.16~rc6 [1]. It was expected that the synthetic workload from the script also produced normal results on the 5.16 and 5.17 kernels. [1] https://bugzilla.opensuse.org/show_bug.cgi?id=1193549 3.1 Basic test report opensuse tumbleweed ----- kernel 5.16.15 --- vanilla --- (kvm) ..updating 250 files on /mnt/inode-ev/zstd: Job took 910 ms @inode_evictions: 250 ..updating 250 files on /mnt/inode-ev/lzo: Job took 740 ms @inode_evictions: 250 ..updating 250 files on /mnt/inode-ev/uncompressed: Job took 717 ms @inode_evictions: 250 opensuse tumbleweed ----- kernel 5.17.1 ---- vanilla --- (kvm) ..updating 250 files on /mnt/inode-ev/zstd: Job took 701 ms @inode_evictions: 250 ..updating 250 files on /mnt/inode-ev/lzo: Job took 695 ms @inode_evictions: 250 ..updating 250 files on /mnt/inode-ev/uncompressed: Job took 954 ms @inode_evictions: 250 4. Simplified test script This simplified script tries to setup, format and mount a ramdisk block device. It creates 3 testing folders (zstd, lzo, uncompressed) and set its btrfs compression property. For each time the script is executed, 3 tests are done and the bpftrace is started right before each test. #!/bin/bash # s3e_t3.sh (based on s3e.sh version 4.7) # Simple Syscall Signature Emulator (test3) # test3: populate + test. test renameat2/openat + unlink syscalls w/ empty files (3x) # Copyright (c) 2022 Bruno Damasceno <bdamasceno@xxxxxxxxxxxxxx> # Warning: no safety checks dir1=zstd dir2=lzo dir3=uncompressed DIR=zzz NUM_FILES=250 DEV=/dev/ram0 MNT=/mnt/inode-ev DIR_1="$MNT/$dir1" DIR_2="$MNT/$dir2" DIR_3="$MNT/$dir3" populate() { DIR=$1 echo "...populating 1st generation of files on $DIR:" for ((i = 1; i <= $NUM_FILES; i++)); do echo -n > $DIR/file_$i done } run_test() { DIR=$1 sync xfs_io -c "fsync" $DIR echo -e "\n...updating $NUM_FILES files on $DIR:" #dumb pause so bpftrace has time to atach its probe sleep 3s start=$(date +%s%N) for ((i = 1; i <= $NUM_FILES; i++)); do mv $DIR/file_$i $DIR/file_$i-RPMDELETE unlink $DIR/file_$i-RPMDELETE echo -n > $DIR/file_$i echo -n "_$i" [ $i != $NUM_FILES ] && echo -ne "\r" done end=$(date +%s%N) dur=$(( (end - start) / 1000000 )) echo -ne "\r" echo "Job took $dur milliseconds" } modprobe brd rd_size=128000 max_part=1 rd_nr=1 mkfs.btrfs --label inode-ev --force $DEV > /dev/null mkdir $MNT mount $DEV $MNT mkdir $MNT/{$dir1,$dir2,$dir3} btrfs property set $DIR_1 compression zstd:1 btrfs property set $DIR_2 compression lzo btrfs property set $DIR_3 compression none for dir in "$DIR_1" "$DIR_2" "$DIR_3" do populate "$dir" bpftrace -e 'kprobe:btrfs_evict_inode { @inode_evictions = count(); }' & run_test "$dir" pkill bpftrace #dumb pause to wait the bpftrace report sleep 2s done umount $MNT rm --dir $MNT rmmod brd