> On 18 Dec 2024, at 6:19, David Rientjes <rientjes@xxxxxxxxxx> wrote: > > Hi everybody, > > We had a very interactive discussion last week led by RaghavendraKT on > slow-tier page promotion intended for memory tiering platforms, thank > you! Thanks as well to everybody who attended and provided great > questions, suggestions, and feedback. > > The RFC patch series "mm: slowtier page promotion based on PTE A bit"[1] > is a proposal to allow for asynchronous page promotion based on memory > accesses as an alternative to NUMA Balancing based promotions. There was > widespread interest in this topic and the discussion surfaced multiple > use cases and requirements, very focused on CXL use cases. > Just sharing my 2 cents. IIUC, the suggested approach has two benefits: 1. Fewer/no page-faults (as A-bit is used to detect usage) 2. Batching While (2) seems like a win that might be added un top of AutoNUMA, (1) is more delicate. As indicated in the patch-set, the "exact destination” is lost. At the same time, the last time I checked, the A-bit setting wasn’t free and cost something like 550 cycles (others saw similar results [1]). So considering empty page-fault is ~1050 cycles (2014 number Linus measured [2]), there is a question how big of a win it is... [1] https://lore.kernel.org/all/20160620000606.GB3194@blaptop/ [2] Google+ post RIP