Re: [regression] 5.15 kernel triggering 100x more inode evictions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 28.03.22 13:28, Thorsten Leemhuis wrote:
> Hi Btrfs developers, this is your Linux kernel regression tracker.
> Top-posting for once, to make this easily accessible to everyone.
> 
> Are there any known regressions in 5.15 that cause more inode evictions?
> There is a user reporting such problems, see the msg quoted below, which
> is the last from this thread:
> 
> https://lore.kernel.org/all/MN2PR20MB2512314446801B92562E26B5D2169@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx/
> 
> @Bruno: sorry, you report is very hard to follow at least for me. Maybe
> the btrfs developers have a idea what's wrong here, but if not you
> likely need to make this easier for all us:
> 
> Write a really simple testcase and run it on those vanilla kernels that
> matter, which are in this case: The latest 5.17.y and 5.15.y releases to
> check if it was already solved -- and if not, on 5.14 and 5.15. Maybe
> you need to bisect the issue to bring us closer the the root of the
> problem. But with a bit of luck the btrfs developers might have an idea
> already.
> 
> Ciao, Thosten

Hi Thorsten,
Thanks for the feedback.
I tried my best to make it simple this time.
I hope it is, at least, better than before.
Grazie, Bruno

*** Test case about high inode eviction on the 5.15 kernel ***

This regression was first observed during rpm operations with specific packages that became A LOT slower to update ranging from 4 to 30 minutes [1].
The symptoms are: high cpu usage, high inode eviction and much slower I/O performance.
Analyzing the rpm's strace report and making some experiments I could replace the rpm with a script that do 3 thing:
- rename a file, unlink the renamed file and create a new file.

This test case is designed to trigger the following regression on the 5.15 kernel:
* repeated renameat2, unlink and openat system calls reach files with btrfs compression property enable.
* the combination of these system calls and the btrfs compression property triggers the high inode eviction.
* the high inode eviction causes to much work for the btrfs directory logging.
* the overloaded btrfs directory logging causes the slower I/O performance.

A simplified script is supplied.
For a more capable script, more information and more test results please refer to my github account [2].

[1] https://bugzilla.opensuse.org/show_bug.cgi?id=1193549
[2] https://github.com/bdamascen0/s3e

Index:
1.   Vanilla kernels
2.   k5.14.21 x k5.15.32
2.1. Detailed test report for compressed files
2.2. Comparison results for compressed files
2.3. Detailed test report for uncompressed files
2.4. Comparison results for uncompressed files
2.5. Cpu usage
3.   k5.17.1 and k5.16.15
3.1. Basic test report
4.   Simplified test script


1.   Vanilla kernels

This test case mainly covers the following vanilla kernels [3]: 5.15.32 and 5.14.21.
The 5.15.32 vanilla kernel produced abnormal results:
* high cpu usage, high inode eviction and much slower I/O performance for compressed files.
* double inode eviction and slightly worse I/O performance for uncompressed files.
The 5.14.21 vanilla kernel produced normal results and is used as a reference.
The 5.17.1 and 5.16.15 vanilla kernels [4] produced normal results which can be found at the end.

[3] https://wiki.ubuntu.com/Kernel/MainlineBuilds 
[4] https://software.opensuse.org/package/kernel-vanilla


2.   k5.14.21 x k5.15.32

2.1  Detailed test report for compressed files

ubuntu jammy jellyfish -- kernel 5.14.21 --- vanilla --- (kvm)
..updating   50 files on /mnt/inode-ev/zstd: Job took    226 ms @inode_evictions: 51
..updating   50 files on /mnt/inode-ev/lzo:  Job took    222 ms @inode_evictions: 51
..updating  100 files on /mnt/inode-ev/zstd: Job took    384 ms @inode_evictions: 101
..updating  100 files on /mnt/inode-ev/lzo:  Job took    462 ms @inode_evictions: 101
..updating  150 files on /mnt/inode-ev/zstd: Job took    493 ms @inode_evictions: 151
..updating  150 files on /mnt/inode-ev/lzo:  Job took    554 ms @inode_evictions: 151
..updating  200 files on /mnt/inode-ev/zstd: Job took    804 ms @inode_evictions: 201
..updating  200 files on /mnt/inode-ev/lzo:  Job took    725 ms @inode_evictions: 201
..updating  250 files on /mnt/inode-ev/zstd: Job took    745 ms @inode_evictions: 251
..updating  250 files on /mnt/inode-ev/lzo:  Job took    758 ms @inode_evictions: 251
..updating 1000 files on /mnt/inode-ev/zstd: Job took   3452 ms @inode_evictions: 1001
..updating 1000 files on /mnt/inode-ev/lzo:  Job took   2979 ms @inode_evictions: 1001
ubuntu jammy jellyfish -- kernel 5.15.32 --- vanilla --- (kvm)
..updating   50 files on /mnt/inode-ev/zstd: Job took    420 ms @inode_evictions: 1275
..updating   50 files on /mnt/inode-ev/lzo:  Job took    488 ms @inode_evictions: 1275
..updating  100 files on /mnt/inode-ev/zstd: Job took   1649 ms @inode_evictions: 5050
..updating  100 files on /mnt/inode-ev/lzo:  Job took   1566 ms @inode_evictions: 5050
..updating  150 files on /mnt/inode-ev/zstd: Job took   4448 ms @inode_evictions: 11325
..updating  150 files on /mnt/inode-ev/lzo:  Job took   4136 ms @inode_evictions: 11325
..updating  200 files on /mnt/inode-ev/zstd: Job took   9177 ms @inode_evictions: 20100
..updating  200 files on /mnt/inode-ev/lzo:  Job took   9070 ms @inode_evictions: 20100
..updating  250 files on /mnt/inode-ev/zstd: Job took  16191 ms @inode_evictions: 31375
..updating  250 files on /mnt/inode-ev/lzo:  Job took  16062 ms @inode_evictions: 31375
..updating 1000 files on /mnt/inode-ev/zstd: Job took 132865 ms @inode_evictions: 104195
..updating 1000 files on /mnt/inode-ev/lzo:  Job took 131979 ms @inode_evictions: 106639

2.2. Comparison results for compressed files

k5.15.32 vanilla (compared to: k5.14.21 vanilla)
50   files gives aprox.  1.8 x more time and aprox.  25 x more inode evictions 
100  files gives aprox.  3.3 x more time and aprox.  50 x more inode evictions 
150  files gives aprox.  7.4 x more time and aprox.  75 x more inode evictions 
200  files gives aprox. 11.4 x more time and aprox. 100 x more inode evictions 
250  files gives aprox. 21.1 x more time and aprox. 125 x more inode evictions 
1000 files gives aprox. 38.4 x more time and aprox. 100 x more inode evictions 

2.3  Detailed test report for uncompressed files

ubuntu jammy jellyfish -- kernel 5.14.21 --- vanilla --- (kvm)
..updating   50 files on /mnt/inode-ev/uncompressed: Job took  214 ms @inode_evictions: 51
..updating  100 files on /mnt/inode-ev/uncompressed: Job took  402 ms @inode_evictions: 101
..updating  150 files on /mnt/inode-ev/uncompressed: Job took  543 ms @inode_evictions: 151
..updating  200 files on /mnt/inode-ev/uncompressed: Job took  694 ms @inode_evictions: 201
..updating  250 files on /mnt/inode-ev/uncompressed: Job took  835 ms @inode_evictions: 251
..updating 1000 files on /mnt/inode-ev/uncompressed: Job took 3162 ms @inode_evictions: 1001
ubuntu jammy jellyfish -- kernel 5.15.32 --- vanilla --- (kvm)
..updating   50 files on /mnt/inode-ev/uncompressed: Job took  269 ms @inode_evictions: 99
..updating  100 files on /mnt/inode-ev/uncompressed: Job took  359 ms @inode_evictions: 199
..updating  150 files on /mnt/inode-ev/uncompressed: Job took  675 ms @inode_evictions: 299
..updating  200 files on /mnt/inode-ev/uncompressed: Job took  752 ms @inode_evictions: 399
..updating  250 files on /mnt/inode-ev/uncompressed: Job took 1149 ms @inode_evictions: 499
..updating 1000 files on /mnt/inode-ev/uncompressed: Job took 7333 ms @inode_evictions: 1999

2.4. Comparison results for uncompressed files

k5.15.32 vanilla (compared to: k5.14.21 vanilla)
50   files gives aprox. 1.2 x more time and aprox. 2 x more inode evictions 
100  files gives aprox. 0.8 x more time and aprox. 2 x more inode evictions 
150  files gives aprox. 1.2 x more time and aprox. 2 x more inode evictions 
200  files gives aprox. 1.0 x more time and aprox. 2 x more inode evictions 
250  files gives aprox. 1.3 x more time and aprox. 2 x more inode evictions 
1000 files gives aprox. 2.3 x more time and aprox. 2 x more inode evictions 

2.5. Cpu usage

ubuntu jammy jellyfish -- kernel 5.15.32 --- vanilla --- (kvm)
..updating 1000 files on /mnt/inode-ev/zstd:         Job took 132691 ms - real 2m12,731s sys 2m 7,134s
..updating 1000 files on /mnt/inode-ev/lzo:          Job took 134130 ms - real 2m14,149s sys 2m 8,447s
..updating 1000 files on /mnt/inode-ev/uncompressed: Job took   7241 ms - real 0m 7,256s sys 0m 4,732s


3    k5.17.1 and k5.16.15

Just for the record, the 5.16 kernel never reproduced the regression.
The real life workload to trigger the regression, the rpm package updates, were verified to work fine since 5.16~rc6 [1].
It was expected that the synthetic workload from the script also produced normal results on the 5.16 and 5.17 kernels.

[1] https://bugzilla.opensuse.org/show_bug.cgi?id=1193549


3.1  Basic test report

opensuse tumbleweed ----- kernel 5.16.15 --- vanilla --- (kvm)
..updating 250 files on /mnt/inode-ev/zstd:         Job took 910 ms @inode_evictions: 250
..updating 250 files on /mnt/inode-ev/lzo:          Job took 740 ms @inode_evictions: 250
..updating 250 files on /mnt/inode-ev/uncompressed: Job took 717 ms @inode_evictions: 250
opensuse tumbleweed ----- kernel 5.17.1 ---- vanilla --- (kvm)
..updating 250 files on /mnt/inode-ev/zstd:         Job took 701 ms @inode_evictions: 250
..updating 250 files on /mnt/inode-ev/lzo:          Job took 695 ms @inode_evictions: 250
..updating 250 files on /mnt/inode-ev/uncompressed: Job took 954 ms @inode_evictions: 250


4.   Simplified test script

This simplified script tries to setup, format and mount a ramdisk block device.
It creates 3 testing folders (zstd, lzo, uncompressed) and set its btrfs compression property.
For each time the script is executed, 3 tests are done and the bpftrace is started right before each test.

#!/bin/bash
# s3e_t3.sh (based on s3e.sh version 4.7)
# Simple Syscall Signature Emulator (test3)
# test3: populate + test. test renameat2/openat + unlink syscalls w/ empty files (3x)
# Copyright (c) 2022 Bruno Damasceno <bdamasceno@xxxxxxxxxxxxxx>
# Warning: no safety checks

dir1=zstd
dir2=lzo
dir3=uncompressed
DIR=zzz
NUM_FILES=250
DEV=/dev/ram0
MNT=/mnt/inode-ev
DIR_1="$MNT/$dir1"
DIR_2="$MNT/$dir2"
DIR_3="$MNT/$dir3"

populate() {
    DIR=$1
    echo "...populating 1st generation of files on $DIR:"
    for ((i = 1; i <= $NUM_FILES; i++)); do
        echo -n > $DIR/file_$i
    done
    }

run_test() {
    DIR=$1
    sync
    xfs_io -c "fsync" $DIR
    echo -e "\n...updating $NUM_FILES files on $DIR:"
    #dumb pause so bpftrace has time to atach its probe
    sleep 3s
    start=$(date +%s%N)
    for ((i = 1; i <= $NUM_FILES; i++)); do
        mv $DIR/file_$i $DIR/file_$i-RPMDELETE
        unlink $DIR/file_$i-RPMDELETE
        echo -n > $DIR/file_$i
        echo -n "_$i"
        [ $i != $NUM_FILES ] && echo -ne "\r"
    done
    end=$(date +%s%N)
    dur=$(( (end - start) / 1000000 ))
    echo -ne "\r"
    echo "Job took $dur milliseconds"
    }

modprobe brd rd_size=128000 max_part=1 rd_nr=1
mkfs.btrfs --label inode-ev --force $DEV > /dev/null
mkdir $MNT
mount $DEV $MNT
mkdir $MNT/{$dir1,$dir2,$dir3}
btrfs property set $DIR_1 compression zstd:1
btrfs property set $DIR_2 compression lzo
btrfs property set $DIR_3 compression none

for dir in "$DIR_1" "$DIR_2" "$DIR_3"
    do
        populate "$dir"
        bpftrace -e 'kprobe:btrfs_evict_inode { @inode_evictions = count(); }' & run_test "$dir"
        pkill bpftrace
        #dumb pause to wait the bpftrace report
        sleep 2s
    done

umount $MNT
rm --dir $MNT
rmmod brd



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux