On 17 Dec 2024, at 23:19, David Rientjes wrote: > Hi everybody, > > We had a very interactive discussion last week led by RaghavendraKT on > slow-tier page promotion intended for memory tiering platforms, thank > you! Thanks as well to everybody who attended and provided great > questions, suggestions, and feedback. > > The RFC patch series "mm: slowtier page promotion based on PTE A bit"[1] > is a proposal to allow for asynchronous page promotion based on memory > accesses as an alternative to NUMA Balancing based promotions. There was > widespread interest in this topic and the discussion surfaced multiple > use cases and requirements, very focused on CXL use cases. > <snip> > ----->o----- > I asked about offloading the migration to a data mover, such as the PSP > for AMD, DMA engine, etc and whether that should be treated entirely > separately as a topic. Bharata said there was a proof-of-concept > available from AMD that does just that but the initial results were not > that encouraging. > > Zi asked if the DMA engine saturated the link between the slow and fast > tiers. If we want to offload to a copy engine, we need to verify that > the throughput is sufficient or we may be better off using idle cpus to > perform the migration for us. <snip> > > - we likely want to reconsider the single threaded nature of the kthread > even if only for NUMA purposes > Related to using DMA engine and/or multi threads for page migration, I had a patchset accelerating page migration[1] back in 2019. It showed good throughput speedup, ~4x using 16 threads to copy multiple 2MB THP. I think it is time to revisit the topic. [1] https://lore.kernel.org/linux-mm/20190404020046.32741-1-zi.yan@xxxxxxxx/ Best Regards, Yan, Zi