Re: RE: RE(2): FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 04, 2023 at 11:31:08AM +0300, Mike Rapoport wrote:
> On Fri, Mar 31, 2023 at 08:45:25PM +0900, Kyungsan Kim wrote:
> > Thank you Mike Rapoport for participating discussion and adding your thought.
> > 
> > >Hi,
> > >
> > >On Thu, Mar 23, 2023 at 07:51:05PM +0900, Kyungsan Kim wrote:
> > >> I appreciate dan for the careful advice.
> > >> 
> > >> >Kyungsan Kim wrote:
> > >> >[..]
> > >> >> >In addition to CXL memory, we may have other kind of memory in the
> > >> >> >system, for example, HBM (High Bandwidth Memory), memory in FPGA card,
> > >> >> >memory in GPU card, etc.  I guess that we need to consider them
> > >> >> >together.  Do we need to add one zone type for each kind of memory?
> > >> >> 
> > >> >> We also don't think a new zone is needed for every single memory
> > >> >> device.  Our viewpoint is the sole ZONE_NORMAL becomes not enough to
> > >> >> manage multiple volatile memory devices due to the increased device
> > >> >> types.  Including CXL DRAM, we think the ZONE_EXMEM can be used to
> > >> >> represent extended volatile memories that have different HW
> > >> >> characteristics.
> > >> >
> > >> >Some advice for the LSF/MM discussion, the rationale will need to be
> > >> >more than "we think the ZONE_EXMEM can be used to represent extended
> > >> >volatile memories that have different HW characteristics". It needs to
> > >> >be along the lines of "yes, to date Linux has been able to describe DDR
> > >> >with NUMA effects, PMEM with high write overhead, and HBM with improved
> > >> >bandwidth not necessarily latency, all without adding a new ZONE, but a
> > >> >new ZONE is absolutely required now to enable use case FOO, or address
> > >> >unfixable NUMA problem BAR." Without FOO and BAR to discuss the code
> > >> >maintainability concern of "fewer degress of freedom in the ZONE
> > >> >dimension" starts to dominate.
> > >> 
> > >> One problem we experienced was occured in the combination of hot-remove and kerelspace allocation usecases.
> > >> ZONE_NORMAL allows kernel context allocation, but it does not allow hot-remove because kernel resides all the time.
> > >> ZONE_MOVABLE allows hot-remove due to the page migration, but it only allows userspace allocation.
> > >> Alternatively, we allocated a kernel context out of ZONE_MOVABLE by adding GFP_MOVABLE flag.
> > >> In case, oops and system hang has occasionally occured because ZONE_MOVABLE can be swapped.
> > >> We resolved the issue using ZONE_EXMEM by allowing seletively choice of the two usecases.
> > >> As you well know, among heterogeneous DRAM devices, CXL DRAM is the first PCIe basis device, which allows hot-pluggability, different RAS, and extended connectivity.
> > >> So, we thought it could be a graceful approach adding a new zone and separately manage the new features.
> > >
> > >This still does not describe what are the use cases that require having
> > >kernel allocations on CXL.mem. 
> > >
> > >I believe it's important to start with explanation *why* it is important to
> > >have kernel allocations on removable devices.
> > > 
> > 
> > In general, a memory system with DDR/CXL DRAM will have near/far memory.
> > And, we think kernel already includes memory tiering solutions - Meta TPP, zswap, and pagecache.
> > Some kernel contexts would prefer fast memory. For example, a hot data with time locality or a data for fast processing such as metadata or indexing.
> > Others would enough with slow memory. For example, a zswap page which is being used while swapping. 
> 
> The point of zswap IIUC is to have small and fast swap device and
> compression is required to better utilize DRAM capacity at expense of CPU
> time.
> 
> Presuming CXL memory will have larger capacity than DRAM, why not skip the
> compression and use CXL as a swap device directly?

I like to shy away from saying CXL memory should be used for swap. I see a 
swap device as storing pages in a manner that is no longer directly addressable
by the cpu. 

Migrating pages to a CXL device is a reasonable approach and I believe we
have the ability to do this in the page reclaim code. 

> 
> And even supposing there's an advantage in putting zswap on CXL memory,
> why that can be done with node-based APIs but warrants a new zone?
> 
> -- 
> Sincerely yours,
> Mike.




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux