On 24/01/25 10:26AM, David Rientjes wrote: > Hi everybody, > > There is a lot of excitement around upcoming CXL type 3 memory expansion > devices and their cost savings potential. As the industry starts to > adopt this technology, one of the key components in strategic planning is > how the upstream Linux kernel will support various tiered configurations > to meet various user needs. I think it goes without saying that this is > quite interesting to cloud providers as well as other hyperscalers :) > > I think this discussion would benefit from a collaborative approach > between various stakeholders and interested parties. Reason being is > that there are several different use cases the need different support > models, but also because there is great incentive toward moving "with" > upstream Linux for this support rather than having multiple parties > bringing up their own stacks only to find that they are diverging from > upstream rather than converging with it. > > I'm interested to learn if there is interest in forming a "Linux Memory > Tiering Work Group" to share ideas, discuss multi-faceted approaches, and > keep track of work items? > > Some recent discussions have proven that there is widespread interest in > some very foundational topics for this technology such as: > > - Decoupling CPU balancing from memory balancing (or obsoleting CPU > balancing entirely) > > + John Hubbard notes this would be useful for GPUs: > > a) GPUs have their own processors that are invisible to the kernel's > NUMA "which tasks are active on which NUMA nodes" calculations, > and > > b) Similar to where CXL is generally going, we have already built > fully memory-coherent hardware, which include memory-only NUMA > nodes. > > - In-kernel hot memory abstraction, informed by hardware hinting drivers > (incl some architectures like Power10), usable as a NUMA Balancing > backend for promotion and other areas of the kernel like transparent > hugepage utilization > > - NUMA and memory tiering enlightenment for accelerators, such as for > optimal use of GPU memory, extremely important for a cloud provider > (hint hint :) > > - Asynchronous memory promotion independent of task_numa_fault() while > considering the cost of page migration (due to identifying cold memory) > > It looks like there is already some interest in such a working group that > would have a biweekly discussion of shared interests with the goal of > accelerating design, development, testing, and division of work: > > Alistair Popple > Aneesh Kumar K V > Brian Morris > Christoph Lameter > Dan Williams > Gregory Price > Grimm, Jon > Huang, Ying > Johannes Weiner > John Hubbard > Zi Yan > > Specifically for the in-kernel hot memory abstraction topic, Google and > Meta recently publushed an OCP base specification "Hyperscale CXL Tiered > Memory Expander Specification" available at > https://drive.google.com/file/d/1fFfU7dFmCyl6V9-9qiakdWaDr9d38ewZ/view?usp=drive_link > that would be great to discuss. > > There is also on-going work in the CXL Consortium to standardize some of > the abstractions for CXL 3.1. > > If folks are interested in this topic and your name doesn't appear above > (I already got you :), please: > > - reply-all to this email to express interest and expand upon the list > of topics above to represent additional areas of interest that should > be included, *or* > > - email me privately to express interest to make sure you are included > > Perhaps I'm overly optimistic, but one thing that would be absolutely > *amazing* would be if we all have a very clear and understandable vision > for how Linux will support the wide variety of use cases, even before > that work is fully implemented (or even designed), by LSF/MM/BPF 2024 > time in May. > > Thanks! > Please add me to the cxl interested parties list. John Groves (jgroves@xxxxxxxxxx / John@xxxxxxxxxxxxxx)