Re: Slow-tier Page Promotion discussion recap and open questions

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 12/30/2024 11:00 AM, David Rientjes wrote:
> On Thu, 19 Dec 2024, Shivank Garg wrote:
> 
>> On 12/18/2024 8:20 PM, Zi Yan wrote:
>>> On 17 Dec 2024, at 23:19, David Rientjes wrote:
>>>
>>>> Hi everybody,
>>>>
>>>> We had a very interactive discussion last week led by RaghavendraKT on
>>>> slow-tier page promotion intended for memory tiering platforms, thank
>>>> you!  Thanks as well to everybody who attended and provided great
>>>> questions, suggestions, and feedback.
>>>>
>>>> The RFC patch series "mm: slowtier page promotion based on PTE A bit"[1]
>>>> is a proposal to allow for asynchronous page promotion based on memory
>>>> accesses as an alternative to NUMA Balancing based promotions.  There was
>>>> widespread interest in this topic and the discussion surfaced multiple
>>>> use cases and requirements, very focused on CXL use cases.
>>>>
>>> <snip>
>>>> ----->o-----
>>>> I asked about offloading the migration to a data mover, such as the PSP
>>>> for AMD, DMA engine, etc and whether that should be treated entirely
>>>> separately as a topic.  Bharata said there was a proof-of-concept
>>>> available from AMD that does just that but the initial results were not
>>>> that encouraging.
>>>>
>>>> Zi asked if the DMA engine saturated the link between the slow and fast
>>>> tiers.  If we want to offload to a copy engine, we need to verify that
>>>> the throughput is sufficient or we may be better off using idle cpus to
>>>> perform the migration for us.
>>>
>>> <snip>
>>>>
>>>>  - we likely want to reconsider the single threaded nature of the kthread
>>>>    even if only for NUMA purposes
>>>>
>>>
>>> Related to using DMA engine and/or multi threads for page migration, I had
>>> a patchset accelerating page migration[1] back in 2019. It showed good
>>> throughput speedup, ~4x using 16 threads to copy multiple 2MB THP. I think
>>> it is time to revisit the topic.
>>>
>>>
>>> [1] https://lore.kernel.org/linux-mm/20190404020046.32741-1-zi.yan@xxxxxxxx/
>>
>> Hi All,
>>
>> I wanted to provide some additional context regarding the AMD DMA offloading
>> POC mentioned by Bharata:
>> https://lore.kernel.org/linux-mm/20240614221525.19170-1-shivankg@xxxxxxx
>>
>> While the initial results weren't as encouraging as hoped, I plan to improve this
>> in next versions of the patchset.
>>
>> The core idea in my RFC patchset is restructuring the folio move operation
>> to better leverage DMA hardware. Instead of the current folio-by-folio approach:
>>
>> for_each_folio() {
>>     copy metadata + content + update PTEs
>> }
>>
>> We batch the operations to minimize overhead:
>>
>> for_each_folio() {
>>     copy metadata
>> }
>> DMA batch copy all content
>> for_each_folio() {
>>     update PTEs
>> }
>>
>> My experiment showed that folio copy can consume up to 26.6% of total migration
>> cost when moving data between NUMA nodes. This suggests significant room for
>> improvement through DMA offloading, particularly for the larger transfers expected
>> in CXL scenarios.
>>
>> It would be interesting work on combining these approaches for optimized page
>> promotion.
>>
> 
> This is very exciting, thanks Shivank and Zi!  The reason I brought this 
> topic up during the session on asynchronous page promotion for memory 
> tiering was because page migration is likely going to become *much* more 
> popular and will be in the critical path under system-wide memory 
> pressure.  Hardware assist and any software optimizations that can go 
> along with it would certainly be very interesting to discuss.
> 
> Shivank, do you have an estimated timeline for when that patch series will 
> be refreshed?  Any planned integration with TMPM?

Hi David,

It's definitely interesting for us to get it working with SDXI.
I'm going to try it out.

Thanks,
Shivank

> 
> Zi, are you looking to refresh your series and continue discussing page 
> migration offload?  We could set up another Linux MM Alignment Session 
> topic focused exactly on this and get representatives from the vendors 
> involved.
> 
> Thanks!





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux