Re: [RFC v2 PATCH 0/5] Promotion of Unmapped Page Cache Folios.

Gregory Price <gourry@xxxxxxxxxx> · Fri, 27 Dec 2024 14:09:50 -0500

On Fri, Dec 27, 2024 at 10:40:36AM -0500, Gregory Price wrote:
> > Can we measure the largest improvement?  For example, run the benchmark
> > with all file pages in DRAM and CXL.mem via numa binding, and compare.
> 
> I can probably come up with something, will rework some stuff.
>

so I did as you suggested, I made a program that allocates a 16GB
buffer, initializes it, them membinds itself to node1 before accessing
the file to force it into pagecache, then i ran a bunch of tests.

Completely unexpected result: ~25% overhead from an inexplicable source.

baseline - no membind()
./test
Read loop took 0.93 seconds

drop caches

./test - w/ membind(1) just before file open
Read loop took 1.16 seconds

node 1 size: 262144 MB
node 1 free: 245756 MB <- file confirmed in cache

kill and relaunch without membind to avoid any funny business
./test
Read loop took 1.16 seconds

enable promotion
Read loop took 3.37 seconds <- migration overhead
... snip ...
Read loop took 1.17 seconds <- stabilizes here

node 1 size: 262144 MB
node 1 free: 262144 MB <- pagecache promoted

Absolutely bizarre result: there is 0% CXL usage ocurring, but the
overhead we originally measured is still present.

This overhead persists even if i do the following
  - disable pagecache promotion
  - disable numa_balancing
  - offline CXL memory entirely

This is actually pretty wild. I presume this must imply the folio flags
are mucked up after migration and we're incurring a bunch of overhead 
on access for no reason. At the very least it doesn't appear to be
an isolated folio issue:

nr_isolated_anon 0
nr_isolated_file 0

I'll have to dig into this further, I wonder if this happens with mapped
memory as well.

~Gregory