ext4 perf regression on LTS kernels

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi folks,

We're working on trying to figure out a severe performance regression in
the 5.4 and older LTS trees. The regression seems to happen only on
physical spinning rust disks, which is why it was probably went
unnoticed.

The regression seems to be introduced in v4.7 with:

       1f60fbe72749 ("ext4: allow readdir()'s of large empty directories to be interrupted")

The fio test used to reproduce it is:

	sync; i=0; while [ $i -lt 4 ]; do ( ( time fio
	--name=disk-burner --readwrite=write --bs=4096 --invalidate=1
	--end_fsync=0 --filesize=800M --runtime=120 --ioengine=libaio
	--thread --numjobs=20 --iodepth=1 --unlink=1 ) 2>&1 | grep
	'^real' ); ((i++)); done

When run with the offending commit, it'll take 3-4x longer to complete.

The regression was fixed upstream somewhere in this merge:

       e5da4c933c50 ("Merge tag 'ext4_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4")

but it seems to be a combination of commits that fix it rather than a
single one.

Now, here's the tricky part... reverting these two commits on top of
v4.19.118 "fixes" the issue:

       06bd3c36a733 ("ext4: fix data exposure after a crash")
       1f60fbe72749 ("ext4: allow readdir()'s of large empty directories to be interrupted")

but clearly this is not something we want to do in the stable trees, so
we're trying to figure out the proper way to fix this.

--
Thanks,
Sasha



[Index of Archives]     [Reiser Filesystem Development]     [Ceph FS]     [Kernel Newbies]     [Security]     [Netfilter]     [Bugtraq]     [Linux FS]     [Yosemite National Park]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Samba]     [Device Mapper]     [Linux Media]

  Powered by Linux