Hi! On Thu 27-08-20 12:09:21, James Scriven (jamscriv) wrote: > Hi, I'm working on migrating a workload from kernel 2.6 to 4.18 (REHL6 to > RHEL8). > > The use case is a build farm that has a basic workflow of: > > 1) rm -rf a large directory tree (about 2M files ~ 200GB) to free some space > 2) download and extract a large tarbar (about 2M files ~ 200GB) > 3) perform a build in the extracted directory tree > Repeat... > > We've being using an ext4 filesystem with no journal for maximum > performance with great success. We're not very concerned about losing > data, but do want some persistence, which is why we don't just use tmpfs > for this. We'll keep a number of these large workspaces around as long as > space permits, and delete the oldest (step 1) just before starting a new > one (step 2). > > When migrating to this newer kernel, we are seeing performance > degradation when we expand the tar, which I suspect is caused by inode > allocation trying to find an unused inode that has not been used too > recently. Since we have 2M deleted inodes that *have* been recently > deleted, every one of the new 2M inodes has to search through all 2M of > the deleted ones (or something to that approximation - my full > understanding of the ext4 code is limited). > > The simple testcase below shows the issue. My question is, is this edge > case already understood? Is there a good way to re-gain this lost > performance? Adding a "sync + drop_caches", or a sufficiently long sleep, > between steps 1 and 2 will work around the issue, but is not ideal. So from the tests below it isn't obvious to me that the recently_deleted() logic is the culprit of the problem. But it is quite possible. We have somewhat changed the logic in commit d05466b27b19 "ext4: avoid ENOSPC when avoiding to reuse recently deleted inodes" so now we just reuse recently deleted inode if we cannot find any better in the current group. That should significantly reduce the cost of searching for free inode in your usecase so you might give that change a try... Honza > # each iteration of the test case the number of recently deleted inodes increases and performance degrades. > > $ uname -a > Linux sjc-asr-bm-470 4.18.0-147.3.1.el8_1.x86_64 #1 SMP Wed Nov 27 01:11:44 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux > $ sync; echo 3 | sudo tee /proc/sys/vm/drop_caches; for x in {1..10}; do rm -rf dirtree; mkdir dirtree; time mkdir dirtree/{1..50000}; done > 3 > > real 0m1.796s > user 0m0.041s > sys 0m1.528s > > real 0m3.280s > user 0m0.035s > sys 0m3.235s > > real 0m4.329s > user 0m0.035s > sys 0m4.279s > > real 0m6.033s > user 0m0.032s > sys 0m5.988s > > real 0m7.303s > user 0m0.041s > sys 0m7.246s > > real 0m7.874s > user 0m0.032s > sys 0m7.826s > > real 0m9.376s > user 0m0.036s > sys 0m9.323s > > real 0m9.979s > user 0m0.052s > sys 0m9.910s > > real 0m9.808s > user 0m0.037s > sys 0m9.749s > > real 0m9.067s > user 0m0.038s > sys 0m9.011s > > > > > $ uname -a > Linux sjc-asr-bm-100 2.6.32-754.17.1.el6.x86_64 #1 SMP Thu Jun 20 11:47:12 EDT 2019 x86_64 x86_64 x86_64 GNU/Linux > $ sync; echo 3 | sudo tee /proc/sys/vm/drop_caches; for x in {1..10}; do rm -rf dirtree; mkdir dirtree; time mkdir dirtree/{1..50000}; done > 3 > > real 0m0.724s > user 0m0.031s > sys 0m0.693s > > real 0m0.762s > user 0m0.041s > sys 0m0.721s > > real 0m0.717s > user 0m0.043s > sys 0m0.674s > > real 0m0.712s > user 0m0.037s > sys 0m0.675s > > real 0m0.749s > user 0m0.036s > sys 0m0.712s > > real 0m0.710s > user 0m0.040s > sys 0m0.670s > > real 0m0.746s > user 0m0.038s > sys 0m0.707s > > real 0m0.715s > user 0m0.034s > sys 0m0.680s > > real 0m0.747s > user 0m0.040s > sys 0m0.707s > > real 0m0.732s > user 0m0.042s > sys 0m0.690s > > > -- Jan Kara <jack@xxxxxxxx> SUSE Labs, CR