On Fri, Dec 27, 2024 at 10:40:36AM -0500, Gregory Price wrote: > > Can we measure the largest improvement? For example, run the benchmark > > with all file pages in DRAM and CXL.mem via numa binding, and compare. > > I can probably come up with something, will rework some stuff. > so I did as you suggested, I made a program that allocates a 16GB buffer, initializes it, them membinds itself to node1 before accessing the file to force it into pagecache, then i ran a bunch of tests. Completely unexpected result: ~25% overhead from an inexplicable source. baseline - no membind() ./test Read loop took 0.93 seconds drop caches ./test - w/ membind(1) just before file open Read loop took 1.16 seconds node 1 size: 262144 MB node 1 free: 245756 MB <- file confirmed in cache kill and relaunch without membind to avoid any funny business ./test Read loop took 1.16 seconds enable promotion Read loop took 3.37 seconds <- migration overhead ... snip ... Read loop took 1.17 seconds <- stabilizes here node 1 size: 262144 MB node 1 free: 262144 MB <- pagecache promoted Absolutely bizarre result: there is 0% CXL usage ocurring, but the overhead we originally measured is still present. This overhead persists even if i do the following - disable pagecache promotion - disable numa_balancing - offline CXL memory entirely This is actually pretty wild. I presume this must imply the folio flags are mucked up after migration and we're incurring a bunch of overhead on access for no reason. At the very least it doesn't appear to be an isolated folio issue: nr_isolated_anon 0 nr_isolated_file 0 I'll have to dig into this further, I wonder if this happens with mapped memory as well. ~Gregory