On Thu, Jan 07, 2021 at 05:08:30PM -0500, Josef Bacik wrote: > Commit 38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in > shrink_delalloc") cleaned up how we do delalloc shrinking by utilizing > some infrastructure we have in place to flush inodes that we use for > device replace and snapshot. However this introduced a pretty serious > performance regression. To reproduce the user untarred the source > tarball of Firefox, and would see it take anywhere from 5 to 20 times as > long to untar in 5.10 compared to 5.9. > > The root cause is because before we would generally use the normal > writeback path to reclaim delalloc space, and for this we would provide > it with the number of pages we wanted to flush. The referenced commit > changed this to flush that many inodes, which drastically increased the > amount of space we were flushing in certain cases, which severely > affected performance. > > We cannot revert this patch unfortunately because of > > btrfs: fix deadlock when cloning inline extent and low on free > metadata space > > which requires the ability to skip flushing inodes that are being cloned > in certain scenarios, which means we need to keep using our flushing > infrastructure or risk re-introducing the deadlock. > > Instead to fix this problem we can go back to providing > btrfs_start_delalloc_roots with a number of pages to flush, and then set > up a writeback_control and utilize sync_inode() to handle the flushing > for us. This gives us the same behavior we had prior to the fix, while > still allowing us to avoid the deadlock that was fixed by Filipe. I > redid the users original test and got the following results on one of > our test machines (256gib of ram, 56 cores, 2tib Intel NVME drive) > > 5.9 0m54.258s > 5.10 1m26.212s > 5.10+patch 0m38.800s > > 5.10+patch is significantly faster than plain 5.9 because of my patch > series "Change data reservations to use the ticketing infra" which > contained the patch that introduced the regression, but generally > improved the overall ENOSPC flushing mechanisms. > > CC: stable@xxxxxxxxxxxxxxx # 5.10 > Reported-by: René Rebe <rene@xxxxxxxxxxxx> > Fixes: 38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc") > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > --- > v2->v3: > - modified the changelog to add information about the patches referenced, and > detail the specs of the machine I used for the performance numbers. Great, thanks. Meanwhile I did some other tests, 'dbench 32' is basically the same and async random write with 'fio --rw=randwrite --size=4g --ioengine=libaio' as well. I'm going to send another rc3 pull request with this patch so we can get it to 5.10 stable.