On 2024/1/3 08:48, Matthew Wilcox wrote:
On Tue, Jan 02, 2024 at 05:26:20PM +0100, David Sterba wrote:
On Fri, Dec 22, 2023 at 05:59:34PM +0800, kernel test robot wrote:
Hello,
kernel test robot noticed a -18.0% regression of stress-ng.link.ops_per_sec on:
commit: 8d993618350c86da11cb408ba529c13e83d09527 ("btrfs: migrate get_eb_page_index() and get_eb_offset_in_page() to folios")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
Unfortunatelly the conversion to folios adds a lot of assembly code and
we can't rely on constants like PAGE_SIZE anymore. The calculations in
extent buffer members are therefore slower, 18% is a lot but within my
expected range for metadta-only operations.
This could be improved by caching some values, like folio_size, so it's
a dereference and not a calculation of "PAGE_SIZE << folio_order" with
conditionals around.
You're in the unfortunate position of paying all the costs of a variable
folio size while not getting the benefit of variable folio sizes ...
No worry, IIRC the -next branch does NOT include the patch to enable
larger folios, just to shake out bugs during the conversion.
We're getting higher improvement already in previous -next branches
which included something reflecting larger folios (not exactly the same
behavior, but using vm_map).
There's no space in struct folio to cache folio_size(). It's an
loff_t, so potentially huge. Also there are people who have designs
on the remaining space in struct folio for a variety of purposes.
Would it be better to be PAGE_SIZE * folio_nr_pages(), which is cached?
That's at least dereference, then shift-variable-by-constant, rather
than dereference, shift-constant-by-variable.
The cache would be in btrfs' specific structure, extent_buffer, so no
affect on MM layer at all.
My plan is to cache a u8 for shift (which can be fitted into some hole),
and u32 for the folio size (which is only 1.5% increase in the size of
extent_buffer).
Thanks,
Qu