On Wed, 8 May 2024, Huang, Ying wrote: > > Hi all, > > > > I think it would be very worthwhile to have a block set aside for > > discussion on locally attached memory tiering extensions at LSF/MM/BPF > > 2024. > > > > Primarily interested in discussing Linux enlightenment for CXL 1.1 and > > later type-3 memory expansion devices (CXL.mem). I think we could touch > > on CXL 2.0 and later memory pooling architectures if we have time and > > there is interest, but the primary focus here would be local attached. > > > > Based on the premise for a Memory Tiering Working Group[1], there is > > widespread interest in the foundational topics for generally useful Linux > > enlightenment: > > > > - Decoupling CPU balancing from memory balancing (or obsoleting CPU > > balancing entirely) > > > > + John Hubbard notes this would be useful for GPUs: > > > > a) GPUs have their own processors that are invisible to the kernel's > > NUMA "which tasks are active on which NUMA nodes" calculations, > > and > > > > b) Similar to where CXL is generally going, we have already built > > fully memory-coherent hardware, which include memory-only NUMA > > nodes. > > > > - In-kernel hot memory abstraction, informed by hardware hinting drivers > > (incl some architectures like Power10), usable as a NUMA Balancing > > backend for promotion and other areas of the kernel like transparent > > hugepage utilization > > > > - NUMA and memory tiering enlightenment for accelerators, such as for > > optimal use of GPU memory, extremely important for a cloud provider > > (hint hint :) > > > > - Asynchronous memory promotion independent of task_numa_fault() while > > considering the cost of page migration (due to identifying cold memory) > > > > - What the role of userspace plays in this decision-making and how we can > > extend the default policy and mechanisms in the kernel to allow for it > > if necessary > > > > Additional topics that you find interesting are also very helpful! > > In addition to the hot memory identification and promotion, I think that > we should consider the cold memory identification and demotion too as a > full solution. The existing method based on the page table accessed bit > may be good enough, but we still need to consider the full solution in > the context of the general NUMA balancing. > I think that's a great suggestion! We'll be able to cover the approach taken by workingset reporting[*] which is quite powerful for the purposes of proactive reclaim through memory.reclaim and would also very be useful for identifying cold memory for the purposes of demotion as well. [*] https://lore.kernel.org/linux-mm/20240504073011.4000534-1-yuanchu@xxxxxxxxxx/T/ > > I'm biased toward a generally useful solution that would leverage the > > kernel as the ultimate source of truth for page hotness that can be > > extended for multiple use caes, one of which is memory tiering support. > > But certainly if there are other approaches, we can discuss that as well. > > > > A few main goals from this discussion: > > > > - Ensure that proposals address, or can be extended to address, the > > emerging needs of the various use cases that users may have > > > > - Surface any constraints that stakeholders may find to be prohibitive > > for support in the core MM subsystem > > > > - Alignment and division of work for developers who are actively looking > > to contribute to this area > > > > As I'm just one of many stakeholders for this discussion, I'd nominate > > Michal Hocko to moderate it if he's willing to do so. If he's so willing, > > we'd be in good hands :) > > > > [1] https://lore.kernel.org/linux-mm/45d850ec-623b-7c07-c266-e948cdbf1f62@xxxxxxxxx/T/ > > -- > Best Regards, > Huang, Ying >