Re: FW: [LSF/MM/BPF TOPIC] SMDK inspired MM changes for CXL

Mike Rapoport <rppt@xxxxxxxxxx> · Mon, 3 Apr 2023 11:44:23 +0300

Hi Dragan,

On Thu, Mar 30, 2023 at 05:03:24PM -0500, Dragan Stancevic wrote:
> On 3/26/23 02:21, Mike Rapoport wrote:
> > Hi,
> > 
> > [..] >> One problem we experienced was occured in the combination of
> hot-remove and kerelspace allocation usecases.
> > > ZONE_NORMAL allows kernel context allocation, but it does not allow hot-remove because kernel resides all the time.
> > > ZONE_MOVABLE allows hot-remove due to the page migration, but it only allows userspace allocation.
> > > Alternatively, we allocated a kernel context out of ZONE_MOVABLE by adding GFP_MOVABLE flag.
> > > In case, oops and system hang has occasionally occured because ZONE_MOVABLE can be swapped.
> > > We resolved the issue using ZONE_EXMEM by allowing seletively choice of the two usecases.
> > > As you well know, among heterogeneous DRAM devices, CXL DRAM is the first PCIe basis device, which allows hot-pluggability, different RAS, and extended connectivity.
> > > So, we thought it could be a graceful approach adding a new zone and separately manage the new features.
> > 
> > This still does not describe what are the use cases that require having
> > kernel allocations on CXL.mem.
> > 
> > I believe it's important to start with explanation *why* it is important to
> > have kernel allocations on removable devices.
> 
> Hi Mike,
> 
> not speaking for Kyungsan here, but I am starting to tackle hypervisor
> clustering and VM migration over cxl.mem [1].
> 
> And in my mind, at least one reason that I can think of having kernel
> allocations from cxl.mem devices is where you have multiple VH connections
> sharing the memory [2]. Where for example you have a user space application
> stored in cxl.mem, and then you want the metadata about this
> process/application that the kernel keeps on one hypervisor be "passed on"
> to another hypervisor. So basically the same way processors in a single
> hypervisors cooperate on memory, you extend that across processors that span
> over physical hypervisors. If that makes sense...

Let me reiterate to make sure I understand your example.
If we focus on VM usecase, your suggestion is to store VM's memory and
associated KVM structures on a CXL.mem device shared by several nodes.  
Even putting aside the aspect of keeping KVM structures on presumably
slower memory, what ZONE_EXMEM will provide that cannot be accomplished
with having the cxl memory in a memoryless node and using that node to
allocate VM metadata?

> [1] A high-level explanation is at http://nil-migration.org
> [2] Compute Express Link Specification r3.0, v1.0 8/1/22, Page 51, figure
> 1-4, black color scheme circle(3) and bars.
> 
> --
> Peace can only come as a natural consequence
> of universal enlightenment -Dr. Nikola Tesla
> 

-- 
Sincerely yours,
Mike.