On Thu, Apr 27, 2023 at 10:10 AM Frank van der Linden <fvdl@xxxxxxxxxx> wrote: > > On Wed, Apr 26, 2023 at 9:30 PM David Rientjes <rientjes@xxxxxxxxxx> wrote: > > > > Hi everybody, > > > > As requested, sending along a last minute topic suggestion for > > consideration for LSF/MM/BPF 2023 :) > > > > For a sizable set of emerging technologies, memory tiering presents one of > > the most formidable challenges and exicting opportunities for the MM > > subsystem today. > > > > "Memory tiering" can mean many different things based on the user: from > > traditional every day NUMA, to swap (to zswap), to NVDIMMs, to HBM, to > > locally attached CXL memory, to memory borrowing over PCIe, to memory > > pooling with disaggregation, and beyond. > > > > Just as NUMA started out only being useful for the supercomputers, memory > > tiering will likely evolve over the next five years to take on an > > expanding set of use cases, and likely with rapidly increasing adoption > > even beyond hyperscalers. > > > > I think a discussion about memory tiering would be highly valuable. A few > > key questions that I think can drive this discussion: > > > > - What are the various form factors that must be supported as short-term > > goals as well as need to be supported 5+ years into the future? > > > > - What incremental changes need to be made on top of NUMA support to > > fully support the wide range of use cases that will be coming? (Is > > memory tiering support built entirely upon NUMA?) > > > > - What is the minimum viable *default* support that the MM subsystem > > should provide for tiered configs? What are the set of optimizations > > that should be left to userspace or BPF to control? > > > > - What are the various page promotion technqiues that we must plan for > > beyond traditional NUMA balancing that will allow us to exploit > > hardware innovation? > > > > (And I'm sure there are more topics of discussion that others would > > readily add. It would be great to have additional ideas in replies.) > > > > A key challenge in all of this is to make memory tiering support in the > > upstream kernel compatible with the roadmaps of various CPU vendors. A > > key goal is to ensure the end user benefits from all of this rapid > > innovation with generalized support that is well abstracted and allows for > > extensibility. > > Thank you for bringing this one up. Memory tiering is a very important > topic that should definitely be discussed. I'm especially interested > in the userspace control part (which I proposed as a separate topic, > but happy to see it addressed as part of this discussion too, as that > is where the motivation originally came from). With the increased > complexity introduced by memory tiers, is it still possible to provide > a one-size-fits-all default? If there is such a default, is it > accurately represented by the current model of NUMA nodes, where pages > will be demoted to a slower tier as a 'reclaim' operation (e.g. you > basically map a global LRU model on to tiers of increased latency)? > Are there reasons to break that model, and should applications be able > to do that? Is the current mempolicy/madvise model sufficient? > > - Frank I am definitely interested in the discussions on memory tiering as well. In particular: - What should be the interface to configure and initialize various memory devices, especially CXL.mem devices, as tiered memory nodes/zones? - What kind of framework do we need to leverage existing and future hardware support (e.g. accessed bits/counters, PMU/IBS, etc) for page promotions? - How can the userspace influence the memory tiering policies? - What kind of memory tiering controls do we want to provide for cgroups? Wei