RE: Re: RE: FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Frank, 
Thank you for your interest on this topic and remaining your opinion.

>On Fri, Mar 31, 2023 at 6:42 AM Matthew Wilcox <willy@xxxxxxxxxxxxx> wrote:
>>
>> On Fri, Mar 31, 2023 at 08:42:20PM +0900, Kyungsan Kim wrote:
>> > Given our experiences/design and industry's viewpoints/inquiries,
>> > I will prepare a few slides in the session to explain
>> >   1. Usecase - user/kernespace memory tiering for near/far placement, memory virtualization between hypervisor/baremetal OS
>> >   2. Issue - movability(movable/unmovable), allocation(explicit/implicit), migration(intented/unintended)
>> >   3. HW - topology(direct, switch, fabric), feature(pluggability,error-handling,etc)
>>
>> I think you'll find everybody else in the room understands these issues
>> rather better than you do.  This is hardly the first time that we've
>> talked about CXL, and CXL is not the first time that people have
>> proposed disaggregated memory, nor heterogenous latency/bandwidth
>> systems.  All the previous attempts have failed, and I expect this
>> one to fail too.  Maybe there's something novel that means this time
>> it really will work, so any slides you do should focus on that.
>>
>> A more profitable discussion might be:
>>
>> 1. Should we have the page allocator return pages from CXL or should
>>    CXL memory be allocated another way?
>> 2. Should there be a way for userspace to indicate that it prefers CXL
>>    memory when it calls mmap(), or should it always be at the discretion
>>    of the kernel?
>> 3. Do we continue with the current ZONE_DEVICE model, or do we come up
>>    with something new?
>>
>>
>
>Point 2 is what I proposed talking about here:
>https://lore.kernel.org/linux-mm/a80a4d4b-25aa-a38a-884f-9f119c03a1da@xxxxxxxxxx/T/
>
>With the current cxl-as-numa-node model, an application can express a
>preference through mbind(). But that also means that mempolicy and
>madvise (e.g. MADV_COLD) are starting to overlap if the intention is
>to use cxl as a second tier for colder memory.  Are these the right
>abstractions? Might it be more flexible to attach properties to memory
>ranges, and have applications hint which properties they prefer?

We also think more userspace hints would be meaningful for diverse purposes of application.
Specific intefaces are need to be discussed, though.

FYI in fact, we expanded mbind() and set_mempolicy() as well to explicitly bind DDR/CXL.
  - mbind(,,MPOL_F_ZONE_EXMEM / MPOL_F_ZONE_NOEXMEM) 
  - set_mempolicy(,,MPOL_F_ZONE_EXMEM / MPOL_F_ZONE_NOEXMEM)
madvise() is also a candidate to express tiering intention.

>
>It's an interesting discussion, and I hope it'll be touched on at
>LSF/MM, happy to participate there.
>
>- Frank



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux