Folios for anonymous memory

Ryan Roberts <ryan.roberts@xxxxxxx> · Wed, 15 Feb 2023 12:38:13 +0000

Hi Matthew, all,

I’ve recently been looking into some potential performance improvements, and
think that folios could help with making these improvements a reality. I’m
hoping that you can answer some questions to help figure out if this makes sense.

First a quick summary of my bench-marking; I’ve been running a Kernel
Compilation test as well as the Speedometer browser performance benchmark (among
others), while trying to better understand the impact of page size on both HW
and SW. To do this, I’ve hacked the arm64 arch code to separate the HW page size
(4K) from the kernel page size (16K). Then I ran 3 kernels (baseline-4k,
baseline-16k, and my hacked up hybrid-16k-4k) - all based on v6.1 - with the aim
of determining the speedups due solely to SW overhead reduction (baseline-4k ->
hybrid-16k-4k), and the speedups due to HW overhead reduction (baseline-4k ->
(baseline-16k - hybrid-16k-4k)).

Results as follows:

Kernel Compilation:
Speed up due to SW overhead reduction: 6.5%
Speed up due to HW overhead reduction: 5.0%
Total speed up: 11.5%

Speedometer 2.0:
Speed up due to SW overhead reduction: 5.3%
Speed up due to HW overhead reduction: 5.1%
Total speed up: 10.4%

Digging into the reasons for the SW-side speedup, it boils down to less
book-keeping - 4x fewer page faults, 4x fewer pages to manage locks/refcounts/…
for, which leads to faster abort and syscall handling. I think these phenomena
are well understood in the Folio context? Although for these workloads, the
memory is primarily anonymous.

I’d like to figure out how to realise some of these benefits in a kernel that
still maintains a 4K page user ABI. Reading over old threads, LWN and watching
Matthew’s talk at OSS last summer, it sounds like this is exactly what Folios
intend to solve?

So a few questions:

- I’ve seen folios for anon memory listed as future work; what’s the current
status? Is anyone looking at this? It’s something that I would be interested to
take a look at if not (although don’t take that as an actual commitment yet!).

- My understanding is that as of v6.0, at least, XFS was the only FS supporting
large folios? Has that picture changed? Is there any likelihood of seeing ext4
and f2fs support anytime soon?

- Matthew mentioned in the talk that he had data showing memory fragmentation
becoming less of an issue as more users we allocating large folios. Is that data
or the experimental approach public?

Thanks,
Ryan