RE: RE: RE(3): FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Jorgen Hansen.
Thank you for joining this topic and share your thoughts.
I'm sorry for late reply due to some major tasks of our team this week.

>> On 24 Mar 2023, at 10.50, Kyungsan Kim <ks0204.kim@xxxxxxxxxxx> wrote:
>> 
>>> On 24.03.23 10:27, Kyungsan Kim wrote:
>>>>> On 24.03.23 10:09, Kyungsan Kim wrote:
>>>>>> Thank you David Hinderbrand for your interest on this topic.
>>>>>> 
>>>>>>>> 
>>>>>>>>> Kyungsan Kim wrote:
>>>>>>>>> [..]
>>>>>>>>>>> In addition to CXL memory, we may have other kind of memory in the
>>>>>>>>>>> system, for example, HBM (High Bandwidth Memory), memory in FPGA card,
>>>>>>>>>>> memory in GPU card, etc.  I guess that we need to consider them
>>>>>>>>>>> together.  Do we need to add one zone type for each kind of memory?
>>>>>>>>>> 
>>>>>>>>>> We also don't think a new zone is needed for every single memory
>>>>>>>>>> device.  Our viewpoint is the sole ZONE_NORMAL becomes not enough to
>>>>>>>>>> manage multiple volatile memory devices due to the increased device
>>>>>>>>>> types.  Including CXL DRAM, we think the ZONE_EXMEM can be used to
>>>>>>>>>> represent extended volatile memories that have different HW
>>>>>>>>>> characteristics.
>>>>>>>>> 
>>>>>>>>> Some advice for the LSF/MM discussion, the rationale will need to be
>>>>>>>>> more than "we think the ZONE_EXMEM can be used to represent extended
>>>>>>>>> volatile memories that have different HW characteristics". It needs to
>>>>>>>>> be along the lines of "yes, to date Linux has been able to describe DDR
>>>>>>>>> with NUMA effects, PMEM with high write overhead, and HBM with improved
>>>>>>>>> bandwidth not necessarily latency, all without adding a new ZONE, but a
>>>>>>>>> new ZONE is absolutely required now to enable use case FOO, or address
>>>>>>>>> unfixable NUMA problem BAR." Without FOO and BAR to discuss the code
>>>>>>>>> maintainability concern of "fewer degress of freedom in the ZONE
>>>>>>>>> dimension" starts to dominate.
>>>>>>>> 
>>>>>>>> One problem we experienced was occured in the combination of hot-remove and kerelspace allocation usecases.
>>>>>>>> ZONE_NORMAL allows kernel context allocation, but it does not allow hot-remove because kernel resides all the time.
>>>>>>>> ZONE_MOVABLE allows hot-remove due to the page migration, but it only allows userspace allocation.
>>>>>>>> Alternatively, we allocated a kernel context out of ZONE_MOVABLE by adding GFP_MOVABLE flag.
>>>>>> 
>>>>>>> That sounds like a bad hack :) .
>>>>>> I consent you.
>>>>>> 
>>>>>>>> In case, oops and system hang has occasionally occured because ZONE_MOVABLE can be swapped.
>>>>>>>> We resolved the issue using ZONE_EXMEM by allowing seletively choice of the two usecases.
>>>>>> 
>>>>>>> I once raised the idea of a ZONE_PREFER_MOVABLE [1], maybe that's
>>>>>>> similar to what you have in mind here. In general, adding new zones is
>>>>>>> frowned upon.
>>>>>> 
>>>>>> Actually, we have already studied your idea and thought it is similar with us in 2 aspects.
>>>>>> 1. ZONE_PREFER_MOVABLE allows a kernelspace allocation using a new zone
>>>>>> 2. ZONE_PREFER_MOVABLE helps less fragmentation by splitting zones, and ordering allocation requests from the zones.
>>>>>> 
>>>>>> We think ZONE_EXMEM also helps less fragmentation.
>>>>>> Because it is a separated zone and handles a page allocation as movable by default.
>>>>> 
>>>>> So how is it different that it would justify a different (more confusing
>>>>> IMHO) name? :) Of course, names don't matter that much, but I'd be
>>>>> interested in which other aspect that zone would be "special".
>>>> 
>>>> FYI for the first time I named it as ZONE_CXLMEM, but we thought it would be needed to cover other extended memory types as well.
>>>> So I changed it as ZONE_EXMEM.
>>>> We also would like to point out a "special" zone aspeact, which is different from ZONE_NORMAL for tranditional DDR DRAM.
>>>> Of course, a symbol naming is important more or less to represent it very nicely, though.
>>>> Do you prefer ZONE_SPECIAL? :)
>>> 
>>> I called it ZONE_PREFER_MOVABLE. If you studied that approach there must
>>> be a good reason to name it differently?
>>> 
>> 
>> The intention of ZONE_EXMEM is a separated logical management dimension originated from the HW diffrences of extended memory devices.
>> Althought the ZONE_EXMEM considers the movable and frementation aspect, it is not all what ZONE_EXMEM considers.
>> So it is named as it.
>
>Given that CXL memory devices can potentially cover a wide range of technologies with quite different latency and bandwidth metrics, will one zone serve as the management vehicle that you seek? If a system contains both CXL attached DRAM and, let say, a byte-addressable CXL SSD - both used as (different) byte addressable tiers in a tiered memory hierarchy, allocating memory from the ZONE_EXMEM doesn’t really tell you much about what you get. So the client would still need an orthogonal method to characterize the desired performance characteristics. 

I agree that a heterogeneous system would be able to adopt multiple types of extended memory devices.
We think ZONE_EXMEM can apply different management algorithms for each extended memory type. 
What we think is ZONE_NORMAL : ZONE_EXMEM = 1 : N, where N is the number of HW device type.
ZONE_NORMAL is for conventional DDR DRAM on DIMM F/F, while ZONE_EXMEM is for extended memories, CXL DRAM, CXL SSD, etc on other F/Fs such as EDSFF. 

We think the movable attribute is a requirement for CXL DRAM device. 
However, there are other SW points we are concerning - implicit allocation and unintended migration - with CXL HW differences.
So, I'm not sure if it is possible or good to cover the matters by combination of ZONE_MOVABLE and ZONE_PREFER_MOVABLE design.
Let me point out again, we proposed the ZONE_EXMEM for the special logical management of extended memory devices.

Specifically, for the performance metric, we think it would be handled not in the zone, but in a node unit.


>This method could be combined with a fabric independent zone such as ZONE_PREFER_MOVABLE to address the kernel allocation issue. At the same time, this new zone could also be useful in other cases, such as virtio-mem.

We agree with your thought. Along with adoption of CXL memory pool and fabric, virtualization SW layers would be added.
Considering not only baremetal OS, but memory inflation/deflation between baremetal OS and a hypervisor, we think ZONE_EXMEM can be useful as the identifier for CXL memory.


>
>Thanks,
>Jorgen



[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux