RE: Re: FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL

Kyungsan Kim <ks0204.kim@xxxxxxxxxxx> · Wed, 5 Apr 2023 11:16:55 +0900

>On 31.03.23 17:56, Frank van der Linden wrote:
>> On Fri, Mar 31, 2023 at 6:42 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>>>
>>> On Fri, Mar 31, 2023 at 08:42:20PM +0900, Kyungsan Kim wrote:
>>>> Given our experiences/design and industry's viewpoints/inquiries,
>>>> I will prepare a few slides in the session to explain
>>>>    1. Usecase - user/kernespace memory tiering for near/far placement, memory virtualization between hypervisor/baremetal OS
>>>>    2. Issue - movability(movable/unmovable), allocation(explicit/implicit), migration(intented/unintended)
>>>>    3. HW - topology(direct, switch, fabric), feature(pluggability,error-handling,etc)
>>>
>>> I think you'll find everybody else in the room understands these issues
>>> rather better than you do.  This is hardly the first time that we've
>>> talked about CXL, and CXL is not the first time that people have
>>> proposed disaggregated memory, nor heterogenous latency/bandwidth
>>> systems.  All the previous attempts have failed, and I expect this
>>> one to fail too.  Maybe there's something novel that means this time
>>> it really will work, so any slides you do should focus on that.
>>>
>>> A more profitable discussion might be:
>>>
>>> 1. Should we have the page allocator return pages from CXL or should
>>>     CXL memory be allocated another way?
>>> 2. Should there be a way for userspace to indicate that it prefers CXL
>>>     memory when it calls mmap(), or should it always be at the discretion
>>>     of the kernel?
>>> 3. Do we continue with the current ZONE_DEVICE model, or do we come up
>>>     with something new?
>>>
>>>
>> 
>> Point 2 is what I proposed talking about here:
>> https://lore.kernel.org/linux-mm/a80a4d4b-25aa-a38a-884f-9f119c03a1da@xxxxxxxxxx/T/
>> 
>> With the current cxl-as-numa-node model, an application can express a
>> preference through mbind(). But that also means that mempolicy and
>> madvise (e.g. MADV_COLD) are starting to overlap if the intention is
>> to use cxl as a second tier for colder memory.  Are these the right
>> abstractions? Might it be more flexible to attach properties to memory
>> ranges, and have applications hint which properties they prefer?
>
>I think history told us that the discussions always go like "but user 
>space wants more control, let's give user space all the power", and a 
>couple of months later we get "but we cannot possibly enlighten all 
>applications, and user space does not have sufficient information: we 
>need the kernel to handle this transparently."
>
>It seems to be a steady back and forth. Most probably we want something 
>in between: cxl-as-numa-node model is already a pretty good and 
>simplistic abstractions. Avoid too many new special user-space knobs is 
>most probably the way to go.
>
>Interesting discussion, I agree. And we had plenty of similar ones 
>already with PMEM and NUMA in general.
>

Haha. funny sentences. IMHO the two kind of contradictory needs exists all the time in real-world.
Based on my experiences, some userlands prefer transparent use, others eager to an optimization chance. 
I also would put higher priority on transparent side, though. 
On linux point of view as the general purpose OS, I believe it has been also a common approach that Linux supports a basic operation, and further provides tunables through API or configurations to support a variety of needs as many as possible.

>-- 
>Thanks,
>
>David / dhildenb