On Mon, Jan 4, 2021 at 5:06 PM Josef Bacik <josef@xxxxxxxxxxxxxx> wrote: > > This is a revert for 38d715f494f2 ("btrfs: use > btrfs_start_delalloc_roots in shrink_delalloc"). A user reported a > problem where performance was significantly worse with this patch > applied. The problem needs to be fixed with proper pre-flushing, and > changes to how we deal with the work queues for the inodes. However > that work is much more complicated than is acceptable for stable, and > simply reverting this patch fixes the problem. The original patch was > a cleanup of the code, so it's fine to revert it. My numbers for the > original reported test, which was untarring a copy of the firefox > sources, are as follows > > 5.9 0m54.258s > 5.10 1m26.212s > Fix 0m35.038s > > cc: stable@xxxxxxxxxxxxxxx # 5.10 > Reported-by: René Rebe <rene@xxxxxxxxxxxx> > Fixes: 38d715f494f2 ("btrfs: use btrfs_start_delalloc_roots in shrink_delalloc") > Signed-off-by: Josef Bacik <josef@xxxxxxxxxxxxxx> > --- > Dave, this is ontop of linus's branch, because we've changed the arguments for > btrfs_start_delalloc_roots in misc-next, and this needs to go back to 5.10 ASAP. > I can send a misc-next version if you want to have it there as well while we're > waiting for it to go into linus's tree, just let me know. Adding this to stable releases will also make the following fix not work on stable releases: https://lore.kernel.org/linux-btrfs/39c2a60aa682f69f9823f51aa119d37ef4b9f834.1606909923.git.fdmanana@xxxxxxxx/ Since now the async reclaim task can trigger writeback through writeback_inodes_sb_nr() and not only through btrfs_start_delalloc_roots(). Other than changing that patch to make extent_write_cache_pages() do nothing when the inode has the bit BTRFS_INODE_NO_DELALLOC_FLUSH set, I'm not seeing other simple ways to do it. Thanks. > > fs/btrfs/space-info.c | 54 ++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 53 insertions(+), 1 deletion(-) > > diff --git a/fs/btrfs/space-info.c b/fs/btrfs/space-info.c > index 64099565ab8f..a2b322275b8d 100644 > --- a/fs/btrfs/space-info.c > +++ b/fs/btrfs/space-info.c > @@ -465,6 +465,28 @@ void btrfs_dump_space_info(struct btrfs_fs_info *fs_info, > up_read(&info->groups_sem); > } > > +static void btrfs_writeback_inodes_sb_nr(struct btrfs_fs_info *fs_info, > + unsigned long nr_pages, u64 nr_items) > +{ > + struct super_block *sb = fs_info->sb; > + > + if (down_read_trylock(&sb->s_umount)) { > + writeback_inodes_sb_nr(sb, nr_pages, WB_REASON_FS_FREE_SPACE); > + up_read(&sb->s_umount); > + } else { > + /* > + * We needn't worry the filesystem going from r/w to r/o though > + * we don't acquire ->s_umount mutex, because the filesystem > + * should guarantee the delalloc inodes list be empty after > + * the filesystem is readonly(all dirty pages are written to > + * the disk). > + */ > + btrfs_start_delalloc_roots(fs_info, nr_items); > + if (!current->journal_info) > + btrfs_wait_ordered_roots(fs_info, nr_items, 0, (u64)-1); > + } > +} > + > static inline u64 calc_reclaim_items_nr(struct btrfs_fs_info *fs_info, > u64 to_reclaim) > { > @@ -490,8 +512,10 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, > struct btrfs_trans_handle *trans; > u64 delalloc_bytes; > u64 dio_bytes; > + u64 async_pages; > u64 items; > long time_left; > + unsigned long nr_pages; > int loops; > > /* Calc the number of the pages we need flush for space reservation */ > @@ -532,8 +556,36 @@ static void shrink_delalloc(struct btrfs_fs_info *fs_info, > > loops = 0; > while ((delalloc_bytes || dio_bytes) && loops < 3) { > - btrfs_start_delalloc_roots(fs_info, items); > + nr_pages = min(delalloc_bytes, to_reclaim) >> PAGE_SHIFT; > + > + /* > + * Triggers inode writeback for up to nr_pages. This will invoke > + * ->writepages callback and trigger delalloc filling > + * (btrfs_run_delalloc_range()). > + */ > + btrfs_writeback_inodes_sb_nr(fs_info, nr_pages, items); > + /* > + * We need to wait for the compressed pages to start before > + * we continue. > + */ > + async_pages = atomic_read(&fs_info->async_delalloc_pages); > + if (!async_pages) > + goto skip_async; > + > + /* > + * Calculate how many compressed pages we want to be written > + * before we continue. I.e if there are more async pages than we > + * require wait_event will wait until nr_pages are written. > + */ > + if (async_pages <= nr_pages) > + async_pages = 0; > + else > + async_pages -= nr_pages; > > + wait_event(fs_info->async_submit_wait, > + atomic_read(&fs_info->async_delalloc_pages) <= > + (int)async_pages); > +skip_async: > loops++; > if (wait_ordered && !trans) { > btrfs_wait_ordered_roots(fs_info, items, 0, (u64)-1); > -- > 2.26.2 > -- Filipe David Manana, “Whether you think you can, or you think you can't — you're right.”