Re: FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL

David Hildenbrand <david@xxxxxxxxxx> · Mon, 3 Apr 2023 10:34:19 +0200

On 31.03.23 17:56, Frank van der Linden wrote:
On Fri, Mar 31, 2023 at 6:42 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:

On Fri, Mar 31, 2023 at 08:42:20PM +0900, Kyungsan Kim wrote:
Given our experiences/design and industry's viewpoints/inquiries,
I will prepare a few slides in the session to explain
   1. Usecase - user/kernespace memory tiering for near/far placement, memory virtualization between hypervisor/baremetal OS
   2. Issue - movability(movable/unmovable), allocation(explicit/implicit), migration(intented/unintended)
   3. HW - topology(direct, switch, fabric), feature(pluggability,error-handling,etc)

I think you'll find everybody else in the room understands these issues
rather better than you do.  This is hardly the first time that we've
talked about CXL, and CXL is not the first time that people have
proposed disaggregated memory, nor heterogenous latency/bandwidth
systems.  All the previous attempts have failed, and I expect this
one to fail too.  Maybe there's something novel that means this time
it really will work, so any slides you do should focus on that.

A more profitable discussion might be:

1. Should we have the page allocator return pages from CXL or should
    CXL memory be allocated another way?
2. Should there be a way for userspace to indicate that it prefers CXL
    memory when it calls mmap(), or should it always be at the discretion
    of the kernel?
3. Do we continue with the current ZONE_DEVICE model, or do we come up
    with something new?

Point 2 is what I proposed talking about here:
https://lore.kernel.org/linux-mm/a80a4d4b-25aa-a38a-884f-9f119c03a1da@xxxxxxxxxx/T/

With the current cxl-as-numa-node model, an application can express a
preference through mbind(). But that also means that mempolicy and
madvise (e.g. MADV_COLD) are starting to overlap if the intention is
to use cxl as a second tier for colder memory.  Are these the right
abstractions? Might it be more flexible to attach properties to memory
ranges, and have applications hint which properties they prefer?

I think history told us that the discussions always go like "but user 
space wants more control, let's give user space all the power", and a 
couple of months later we get "but we cannot possibly enlighten all 
applications, and user space does not have sufficient information: we 
need the kernel to handle this transparently."

It seems to be a steady back and forth. Most probably we want something 
in between: cxl-as-numa-node model is already a pretty good and 
simplistic abstractions. Avoid too many new special user-space knobs is 
most probably the way to go.

Interesting discussion, I agree. And we had plenty of similar ones 
already with PMEM and NUMA in general.

--
Thanks,

David / dhildenb