Re: Slow-tier Page Promotion discussion recap and open questions

Raghavendra K T <rkodsara@xxxxxxx> · Fri, 20 Dec 2024 16:58:39 +0530

On 12/18/2024 8:51 PM, Nadav Amit wrote:

On 18 Dec 2024, at 6:19, David Rientjes <rientjes@xxxxxxxxxx> wrote:

Hi everybody,

We had a very interactive discussion last week led by RaghavendraKT on
slow-tier page promotion intended for memory tiering platforms, thank
you!  Thanks as well to everybody who attended and provided great
questions, suggestions, and feedback.

The RFC patch series "mm: slowtier page promotion based on PTE A bit"[1]
is a proposal to allow for asynchronous page promotion based on memory
accesses as an alternative to NUMA Balancing based promotions.  There was
widespread interest in this topic and the discussion surfaced multiple
use cases and requirements, very focused on CXL use cases.

Just sharing my 2 cents.

IIUC, the suggested approach has two benefits:

1. Fewer/no page-faults (as A-bit is used to detect usage)
2. Batching

While (2) seems like a win that might be added un top of AutoNUMA, (1)
is more delicate. As indicated in the patch-set, the "exact destination”
is lost. At the same time, the last time I checked, the A-bit setting
wasn’t free and cost something like 550 cycles (others saw similar
results [1]).

So considering empty page-fault is ~1050 cycles (2014 number Linus
measured [2]), there is a question how big of a win it is...

[1] https://lore.kernel.org/all/20160620000606.GB3194@blaptop/
[2] Google+ post RIP

Thanks for the feedback. (as I noted in other post), can A bit scanning
that detects hot VMA information be fed to NUMAB=1 scanning?

Regards
- Raghu