> On May 14, 2024, at 3:30 PM, Adam Manzanares <a.manzanares@xxxxxxxxxxx> wrote: > > On Tue, May 14, 2024 at 01:43:29PM +0200, Yiannis Nikolakopoulos wrote: >> Hello all, >> >> This is almost literally last minute and the work itself is only >> getting started. But since I'm virtually attending, I thought to give >> it a shot in case it is of interest. >> >> Background: at ZeroPoint Technologies we are developing inline memory >> compression IP. Currently we are focusing on CXL type 3 devices >> (memory expanders), effectively introducing a compressed memory tier >> (i.e. fulfilling the OCP specification "Hyperscale CXL Tiered Memory >> Expander Specification”). >> >> To utilize the memory saved due to compression, we oversubscribe the >> Device Physical Address Space (DPA) in addition to some custom .io >> interfaces. If there is interest, I would be glad to present these >> APIs and how the host's point of view changes compared to a "typical" >> memory tiering system. The goal would be to get some early feedback >> and direction for our upstream driver development, before we start >> pushing the first RFCs. > > I am interested in understanding the interfaces and how the memory will be > consumed and presented by the MM subsystem. It is my understanding that > we have a 30 min slot for CXL related topics open this morning. I think this > would be a good fit. > After attending the talk, I think it’s really need to clarify/justify the use-cases that could benefit by employing this approach. Maybe, I am missing something here. But, I think that compaction scheme could be a potential pain. I mean even one application allocates and deallocates/frees memory very frequently. If we place several compressed data portions into some physical granularity (for example, 4K or any other size), then freeing operation can definitely creates holes or introduce fragmentation. Memory operations are really fast and such fragmentation could be really significant. And such fragmentation could be more critical for the case of multiple applications and multiple hosts. It sounds for me that this approach could really require a GC or a defragmentation subsystem. But such subsystem could introduce additional latency or could affect applications’ performance. So, maybe, smart allocation policy can help here, otherwise, it needs to introduce a really smart and efficient defragmentation or GC policy. Definitely, it’s interesting problem to think about. :) Thanks, Slava.