This is a note to let you know that I've just added the patch titled btrfs: do not wait for short bulk allocation to the 6.6-stable tree which can be found at: http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary The filename of the patch is: btrfs-do-not-wait-for-short-bulk-allocation.patch and it can be found in the queue-6.6 subdirectory. If you, or anyone else, feels it should not be added to the stable tree, please let <stable@xxxxxxxxxxxxxxx> know about it. >From 1db7959aacd905e6487d0478ac01d89f86eb1e51 Mon Sep 17 00:00:00 2001 From: Qu Wenruo <wqu@xxxxxxxx> Date: Tue, 26 Mar 2024 09:16:46 +1030 Subject: btrfs: do not wait for short bulk allocation From: Qu Wenruo <wqu@xxxxxxxx> commit 1db7959aacd905e6487d0478ac01d89f86eb1e51 upstream. [BUG] There is a recent report that when memory pressure is high (including cached pages), btrfs can spend most of its time on memory allocation in btrfs_alloc_page_array() for compressed read/write. [CAUSE] For btrfs_alloc_page_array() we always go alloc_pages_bulk_array(), and even if the bulk allocation failed (fell back to single page allocation) we still retry but with extra memalloc_retry_wait(). If the bulk alloc only returned one page a time, we would spend a lot of time on the retry wait. The behavior was introduced in commit 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations"). [FIX] Although the commit mentioned that other filesystems do the wait, it's not the case at least nowadays. All the mainlined filesystems only call memalloc_retry_wait() if they failed to allocate any page (not only for bulk allocation). If there is any progress, they won't call memalloc_retry_wait() at all. For example, xfs_buf_alloc_pages() would only call memalloc_retry_wait() if there is no allocation progress at all, and the call is not for metadata readahead. So I don't believe we should call memalloc_retry_wait() unconditionally for short allocation. Call memalloc_retry_wait() if it fails to allocate any page for tree block allocation (which goes with __GFP_NOFAIL and may not need the special handling anyway), and reduce the latency for btrfs_alloc_page_array(). Reported-by: Julian Taylor <julian.taylor@xxxxxxxx> Tested-by: Julian Taylor <julian.taylor@xxxxxxxx> Link: https://lore.kernel.org/all/8966c095-cbe7-4d22-9784-a647d1bf27c3@xxxxxxxx/ Fixes: 395cb57e8560 ("btrfs: wait between incomplete batch memory allocations") CC: stable@xxxxxxxxxxxxxxx # 6.1+ Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@xxxxxxxxxx> Reviewed-by: Filipe Manana <fdmanana@xxxxxxxx> Signed-off-by: Qu Wenruo <wqu@xxxxxxxx> Reviewed-by: David Sterba <dsterba@xxxxxxxx> Signed-off-by: David Sterba <dsterba@xxxxxxxx> Signed-off-by: Greg Kroah-Hartman <gregkh@xxxxxxxxxxxxxxxxxxx> --- fs/btrfs/extent_io.c | 14 ++------------ 1 file changed, 2 insertions(+), 12 deletions(-) --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -686,24 +686,14 @@ int btrfs_alloc_page_array(unsigned int unsigned int last = allocated; allocated = alloc_pages_bulk_array(GFP_NOFS, nr_pages, page_array); - - if (allocated == nr_pages) - return 0; - - /* - * During this iteration, no page could be allocated, even - * though alloc_pages_bulk_array() falls back to alloc_page() - * if it could not bulk-allocate. So we must be out of memory. - */ - if (allocated == last) { + if (unlikely(allocated == last)) { + /* No progress, fail and do cleanup. */ for (int i = 0; i < allocated; i++) { __free_page(page_array[i]); page_array[i] = NULL; } return -ENOMEM; } - - memalloc_retry_wait(GFP_NOFS); } return 0; } Patches currently in stable-queue which might be from wqu@xxxxxxxx are queue-6.6/btrfs-set-correct-ram_bytes-when-splitting-ordered-extent.patch queue-6.6/btrfs-always-clear-pertrans-metadata-during-commit.patch queue-6.6/btrfs-do-not-wait-for-short-bulk-allocation.patch queue-6.6/btrfs-make-btrfs_clear_delalloc_extent-free-delalloc.patch