Re: [PATCH RFC v3 0/4] mTHP-friendly compression in zsmalloc and zram based on multi-pages

Barry Song <21cnbao@xxxxxxxxx> · Fri, 29 Nov 2024 09:56:59 +1300

On Wed, Nov 27, 2024 at 6:04 PM Sergey Senozhatsky
<senozhatsky@xxxxxxxxxxxx> wrote:
>
> On (24/11/27 09:31), Barry Song wrote:
> > On Tue, Nov 26, 2024 at 11:53 PM Sergey Senozhatsky
> > <senozhatsky@xxxxxxxxxxxx> wrote:
> > >
> > > On (24/11/26 14:09), Sergey Senozhatsky wrote:
> > > > > swap-out time(ms)       68711              49908
> > > > > swap-in time(ms)        30687              20685
> > > > > compression ratio       20.49%             16.9%
> > >
> > > I'm also sort of curious if you'd use zstd with pre-trained user
> > > dictionary [1] (e.g. based on a dump of your swap-file under most
> > > common workloads) would it give you desired compression ratio
> > > improvements (on current zram, that does single page compression).
> > >
> > > [1] https://github.com/facebook/zstd?tab=readme-ov-file#the-case-for-small-data-compression
> >
> > Not yet, but it might be worth trying. A key difference between servers and
> > Android phones is that phones have millions of different applications
> > downloaded from the Google Play Store or other sources.
>
> Maybe yes maybe not, I don't know.  It could be that that 99% of users
> use the same 1% apps out of those millions.
>
> > In this case, would using a dictionary be a feasible approach? Apologies
> > if my question seems too naive.
>
> It's a good question, and there is probably only one way to answer
> it - through experiments, it's data dependent, so it's case-by-case.

Sure, we may collect data on the most popular apps (e.g., the top 100) and
train zstd using their anonymous data to identify patterns. We’ll follow up
with you afterward.

>
> > On the other hand, the advantage of a pre-trained user dictionary
> > doesn't outweigh the benefits of large block compression? Can’t both
> > be used together?
>
> Well, so far the approach has many unmeasured unknowns and corner
> cases, I don't think I personally even understand all of them to begin

I agree we can make an effort to dig deeper and collect more data, analyzing as
many corner cases as possible but many unknowns are a common characteristic
of new things :-)

> with.  Not sure if I have a way to measure and analyze, that mTHP
> swapout seems like a relatively new thing and it also seems that you
> are still fixing some of its issues/shortcomings.

A challenge is determining how to make mTHP fully transparent (e.g.,
not dependent
on sysfs controls for enabling/disabling) across various workloads.
The default policy
may not always be optimal for all workloads.

Despite that, there are certainly benefits we can gain from mTHP
within zsmalloc/zram.

Thanks
Barry