Re: [Resend LSF/MM/BPF TOPIC] A case of a CXL compressed memory tier

Viacheslav Dubeyko <slava@xxxxxxxxxxx> · Sun, 19 May 2024 13:37:07 +0300

> On May 14, 2024, at 3:30 PM, Adam Manzanares <a.manzanares@xxxxxxxxxxx> wrote:
> 
> On Tue, May 14, 2024 at 01:43:29PM +0200, Yiannis Nikolakopoulos wrote:
>> Hello all,
>> 
>> This is almost literally last minute and the work itself is only
>> getting started. But since I'm virtually attending, I thought to give
>> it a shot in case it is of interest.
>> 
>> Background: at ZeroPoint Technologies we are developing inline memory
>> compression IP. Currently we are focusing on CXL type 3 devices
>> (memory expanders), effectively introducing a compressed memory tier
>> (i.e. fulfilling the OCP specification &quot;Hyperscale CXL Tiered Memory
>> Expander Specification”).
>> 
>> To utilize the memory saved due to compression, we oversubscribe the
>> Device Physical Address Space (DPA) in addition to some custom .io
>> interfaces. If there is interest, I would be glad to present these
>> APIs and how the host's point of view changes compared to a &quot;typical&quot;
>> memory tiering system. The goal would be to get some early feedback
>> and direction for our upstream driver development, before we start
>> pushing the first RFCs.
> 
> I am interested in understanding the interfaces and how the memory will be
> consumed and presented by the MM subsystem. It is my understanding that
> we have a 30 min slot for CXL related topics open this morning. I think this 
> would be a good fit.
> 

After attending the talk, I think it’s really need to clarify/justify the use-cases that
could benefit by employing this approach.

Maybe, I am missing something here. But, I think that compaction scheme could be
a potential pain. I mean even one application allocates and deallocates/frees memory
very frequently. If we place several compressed data portions into some physical granularity
(for example, 4K or any other size), then freeing operation can definitely creates holes or
introduce fragmentation. Memory operations are really fast and such fragmentation could be
really significant. And such fragmentation could be more critical for the case of multiple
applications and multiple hosts. It sounds for me that this approach could really require
a GC  or a defragmentation subsystem. But such subsystem could introduce additional latency or
could affect applications’ performance. So, maybe, smart allocation policy can help here,
otherwise, it needs to introduce a really smart and efficient defragmentation or GC policy.
Definitely, it’s interesting problem to think about. :)

Thanks,
Slava.