RE: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thank you dan for kind reminder of the submission. 
I filled out the form with topic suggestions and required attendees.
Hopefully, we can elaborate the topics with wider opinions revisiting previous kernel designs related.


>Please be sure to log this in the submission spreadsheet as well. From the
>CFP:
>
>---
>
>1) Fill out the following Google form to request attendance and
>suggest any topics
>
>        https://forms.gle/VKVXjWGBHZbnsz226
>
>In previous years we have accidentally missed people's attendance
>requests because they either didn't cc lsf-pc@ or we simply missed them
>in the flurry of emails we get.  Our community is large and our
>volunteers are busy, filling this out will help us make sure we don't
>miss anybody.
>
>
>Kyungsan Kim wrote:
>> CXL is a promising technology that leads to fundamental changes in
>> computing architecture.  To facilitate adoption and widespread of CXL
>> memory, we are developing a memory tiering solution, called
>> SMDK[1][2].  Using SMDK and CXL RAM device, our team has been working
>> with industry and academic partners over last year.  Also, thanks to
>> many researcher's effort, CXL adoption stage is gradually moving
>> forward from basic enablement to real-world composite usecases.  At
>> this moment, based on the researches and experiences gained working on
>> SMDK, we would like to suggest a session at LSF/MM/BFP this year to
>> propose possible Linux MM changes with a brief of SMDK.
>>
>> Adam Manzanares kindly adviced me that it is preferred to discuss
>> implementation details on given problem and consensus at LSF/MM/BFP.
>> Considering the adoption stage of CXL technology, however, let me
>> suggest a design level discussion on the two MM expansions of SMDK
>> this year.  When we have design consensus with participants, we want
>> to continue follow-up discussions with additional implementation
>> details, hopefully.
>>
>> 
>> 1. A new zone, ZONE_EXMEM We added ZONE_EXMEM to manage CXL RAM
>> device(s), separated from ZONE_NORMAL for usual DRAM due to the three
>> reasons below.
>>
>> 1) a CXL RAM has many different characteristics with conventional DRAM
>> because a CXL device inherits and expands PCIe specification.  ex)
>> frequency range, pluggability, link speed/width negotiation,
>> host/device flow control, power throttling, channel-interleaving
>> methodology, error handling, and etc.  It is likely that the primary
>> usecase of CXL RAM would be System RAM.  However, to deal with the
>> hardware differences properly, different MM algorithms are needed
>> accordingly.
>>
>> 2) Historically, zone has been expanded by reflecting the evolution of
>> CPU, IO, and memory devices.  ex) ZONE_DMA(32), ZONE_HIGHMEM,
>> ZONE_DEVICE, and ZONE_MOVABLE.  Each zone applies different MM
>> algorithms such as page reclaim, compaction, migration, and
>> fragmentation.  At first, we tried reuse of existing zones,
>> ZONE_DEVICE and ZONE_MOVABLE, for CXL RAM purpose.  However, the
>> purpose and implementation of the zones are not fit for CXL RAM.
>>
>> 3) Industry is preparing a CXL-capable system that connects dozens of
>> CXL devices in a server system.  When a CXL device becomes a separate
>> node, an administrator/programmer needs to be aware of and manually
>> control all nodes using 3rd party software, such as numactl and
>> libnuma.  ZONE_EXMEM allows the assemble of CXL RAM devices into the
>> single ZONE_EXMEM zone, and provides an abstraction to userspace by
>> seamlessly managing the devices.  Also, the zone is able to interleave
>> assembled devices in a software way to lead to aggregated bandwidth.
>> We would like to suggest if it is co-existable with HW interleaving
>> like SW/HW raid0.  To help understanding, please refer to the node
>> partition part of the picture[3].
>>
>>
>> 2. User/Kernelspace Programmable Interface In terms of a memory
>> tiering solution, it is typical that the solution attempts to locate
>> hot data on near memory, and cold data on far memory as accurately as
>> possible.[4][5][6][7] We noticed that the hot/coldness of data is
>> determined by the memory access pattern of running application and/or
>> kernel context.  Hence, a running context needs a near/far memory
>> identifier to determine near/far memory.  When CXL RAM(s) is
>> manipulated as a NUMA node, a node id can be function as a CXL
>> identifier more or less.  However, the node id has limitation in that
>> it is an ephemeral information that dynamically varies according to
>> online status of CXL topology and system socket.  In this sense, we
>> provides programmable interfaces for userspace and kernelspace context
>> to explicitly (de)allocate memory from DRAM and CXL RAM regardless of
>> a system change.  Specifically, MAP_EXMEM and GFP_EXMEM flags were
>> added to mmap() syscall and kmalloc() siblings, respectively.
>>
>> Thanks to Adam Manzanares for reviewing this CFP thoroughly.
>>
>>
>> [1]SMDK: https://github.com/openMPDK/SMDK
>> [2]SMT: Software-defined Memory Tiering for Heterogeneous Computing systems with CXL Memory Expander, https://ieeexplore.ieee.org/document/10032695
>> [3]SMDK node partition: https://github.com/OpenMPDK/SMDK/wiki/2.-SMDK-Architecture#memory-partition
>> [4]TMO: Transparent Memory Offloading in Datacenters, https://dl.acm.org/doi/10.1145/3503222.3507731
>> [5]TPP: Transparent Page Placement for CXL-Enabled Tiered Memory, https://arxiv.org/abs/2206.02878
>> [6]Pond: CXL-Based Memory Pooling Systems for Cloud Platforms, https://dl.acm.org/doi/10.1145/3575693.3578835
>> [7]Hierarchical NUMA: https://blog.linuxplumbersconf.org/2017/ocw/system/presentations/4656/original/Hierarchical_NUMA_Design_Plumbers_2017.pdf
>



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux