Re: [linux-next:master] [btrfs] 8d99361835: stress-ng.link.ops_per_sec -18.0% regression

Qu Wenruo <quwenruo.btrfs@xxxxxxx> · Wed, 3 Jan 2024 09:02:42 +1030

On 2024/1/3 08:48, Matthew Wilcox wrote:
On Tue, Jan 02, 2024 at 05:26:20PM +0100, David Sterba wrote:
On Fri, Dec 22, 2023 at 05:59:34PM +0800, kernel test robot wrote:

Hello,

kernel test robot noticed a -18.0% regression of stress-ng.link.ops_per_sec on:

commit: 8d993618350c86da11cb408ba529c13e83d09527 ("btrfs: migrate get_eb_page_index() and get_eb_offset_in_page() to folios")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

Unfortunatelly the conversion to folios adds a lot of assembly code and
we can't rely on constants like PAGE_SIZE anymore. The calculations in
extent buffer members are therefore slower, 18% is a lot but within my
expected range for metadta-only operations.

This could be improved by caching some values, like folio_size, so it's
a dereference and not a calculation of "PAGE_SIZE << folio_order" with
conditionals around.

You're in the unfortunate position of paying all the costs of a variable
folio size while not getting the benefit of variable folio sizes ...

No worry, IIRC the -next branch does NOT include the patch to enable
larger folios, just to shake out bugs during the conversion.

We're getting higher improvement already in previous -next branches
which included something reflecting larger folios (not exactly the same
behavior, but using vm_map).

There's no space in struct folio to cache folio_size().  It's an
loff_t, so potentially huge.  Also there are people who have designs
on the remaining space in struct folio for a variety of purposes.
Would it be better to be PAGE_SIZE * folio_nr_pages(), which is cached?
That's at least dereference, then shift-variable-by-constant, rather
than dereference, shift-constant-by-variable.

The cache would be in btrfs' specific structure, extent_buffer, so no
affect on MM layer at all.

My plan is to cache a u8 for shift (which can be fitted into some hole),
and u32 for the folio size (which is only 1.5% increase in the size of
extent_buffer).

Thanks,
Qu