Re: Slow-tier Page Promotion discussion recap and open questions

Zi Yan <ziy@xxxxxxxxxx> · Wed, 18 Dec 2024 09:50:08 -0500

On 17 Dec 2024, at 23:19, David Rientjes wrote:

> Hi everybody,
>
> We had a very interactive discussion last week led by RaghavendraKT on
> slow-tier page promotion intended for memory tiering platforms, thank
> you!  Thanks as well to everybody who attended and provided great
> questions, suggestions, and feedback.
>
> The RFC patch series "mm: slowtier page promotion based on PTE A bit"[1]
> is a proposal to allow for asynchronous page promotion based on memory
> accesses as an alternative to NUMA Balancing based promotions.  There was
> widespread interest in this topic and the discussion surfaced multiple
> use cases and requirements, very focused on CXL use cases.
>
<snip>
> ----->o-----
> I asked about offloading the migration to a data mover, such as the PSP
> for AMD, DMA engine, etc and whether that should be treated entirely
> separately as a topic.  Bharata said there was a proof-of-concept
> available from AMD that does just that but the initial results were not
> that encouraging.
>
> Zi asked if the DMA engine saturated the link between the slow and fast
> tiers.  If we want to offload to a copy engine, we need to verify that
> the throughput is sufficient or we may be better off using idle cpus to
> perform the migration for us.

<snip>
>
>  - we likely want to reconsider the single threaded nature of the kthread
>    even if only for NUMA purposes
>

Related to using DMA engine and/or multi threads for page migration, I had
a patchset accelerating page migration[1] back in 2019. It showed good
throughput speedup, ~4x using 16 threads to copy multiple 2MB THP. I think
it is time to revisit the topic.

[1] https://lore.kernel.org/linux-mm/20190404020046.32741-1-zi.yan@xxxxxxxx/

Best Regards,
Yan, Zi