>Kyungsan Kim <ks0204.kim@xxxxxxxxxxx> writes: > >> I appreciate dan for the careful advice. >> >>>Kyungsan Kim wrote: >>>[..] >>>> >In addition to CXL memory, we may have other kind of memory in the >>>> >system, for example, HBM (High Bandwidth Memory), memory in FPGA card, >>>> >memory in GPU card, etc. I guess that we need to consider them >>>> >together. Do we need to add one zone type for each kind of memory? >>>> >>>> We also don't think a new zone is needed for every single memory >>>> device. Our viewpoint is the sole ZONE_NORMAL becomes not enough to >>>> manage multiple volatile memory devices due to the increased device >>>> types. Including CXL DRAM, we think the ZONE_EXMEM can be used to >>>> represent extended volatile memories that have different HW >>>> characteristics. >>> >>>Some advice for the LSF/MM discussion, the rationale will need to be >>>more than "we think the ZONE_EXMEM can be used to represent extended >>>volatile memories that have different HW characteristics". It needs to >>>be along the lines of "yes, to date Linux has been able to describe DDR >>>with NUMA effects, PMEM with high write overhead, and HBM with improved >>>bandwidth not necessarily latency, all without adding a new ZONE, but a >>>new ZONE is absolutely required now to enable use case FOO, or address >>>unfixable NUMA problem BAR." Without FOO and BAR to discuss the code >>>maintainability concern of "fewer degress of freedom in the ZONE >>>dimension" starts to dominate. >> >> One problem we experienced was occured in the combination of hot-remove and kerelspace allocation usecases. >> ZONE_NORMAL allows kernel context allocation, but it does not allow hot-remove because kernel resides all the time. >> ZONE_MOVABLE allows hot-remove due to the page migration, but it only allows userspace allocation. >> Alternatively, we allocated a kernel context out of ZONE_MOVABLE by adding GFP_MOVABLE flag. >> In case, oops and system hang has occasionally occured because ZONE_MOVABLE can be swapped. >> We resolved the issue using ZONE_EXMEM by allowing seletively choice of the two usecases. > >Sorry, I don't get your idea. You want the memory range > > 1. can be hot-removed > 2. allow kernel context allocation > >This appears impossible for me. Why cannot you just use ZONE_MOVABLE? Indeed, we tried the approach. It was able to allocate a kernel context from ZONE_MOVABLE using GFP_MOVABLE. However, we think it would be a bad practice for the 2 reasons. 1. It causes oops and system hang occasionally due to kernel page migration while swap or compaction. 2. Literally, the design intention of ZONE_MOVABLE is to a page movable. So, we thought allocating a kernel context from the zone hurts the intention. Allocating a kernel context out of ZONE_EXMEM is unmovable. a kernel context - alloc_pages(GFP_EXMEM,) Allocating a user context out of ZONE_EXMEM is movable. a user context - mmap(,,MAP_EXMEM,) - syscall - alloc_pages(GFP_EXMEM | GFP_MOVABLE,) This is how ZONE_EXMEM supports the two cases. > >Best Regards, >Huang, Ying > >> As you well know, among heterogeneous DRAM devices, CXL DRAM is the first PCIe basis device, which allows hot-pluggability, different RAS, and extended connectivity. >> So, we thought it could be a graceful approach adding a new zone and separately manage the new features. >> >> Kindly let me know any advice or comment on our thoughts.