Re: [PATCH RFC v3 0/4] mTHP-friendly compression in zsmalloc and zram based on multi-pages

Barry Song <21cnbao@xxxxxxxxx> · Fri, 29 Nov 2024 09:40:09 +1300

On Wed, Nov 27, 2024 at 5:52 PM Sergey Senozhatsky
<senozhatsky@xxxxxxxxxxxx> wrote:
>
> On (24/11/27 09:20), Barry Song wrote:
> [..]
> > >    390 12736
> > >    395 13056
> > >    404 13632
> > >    410 14016
> > >    415 14336
> > >    418 14528
> > >    447 16384
> > >
> > > E.g. 13632 and 13056 are more than 500 bytes apart.
> > >
> > > > swap-out time(ms)       68711              49908
> > > > swap-in time(ms)        30687              20685
> > > > compression ratio       20.49%             16.9%
> > >
> > > These are not the only numbers to focus on, really important metrics
> > > are: zsmalloc pages-used and zsmalloc max-pages-used.  Then we can
> > > calculate the pool memory usage ratio (the size of compressed data vs
> > > the number of pages zsmalloc pool allocated to keep them).
> >
> > To address this, we plan to collect more data and get back to you
> > afterwards. From my understanding, we still have an opportunity
> > to refine the CHAIN SIZE?
>
> Do you mean changing the value?  It's configurable.
>
> > Essentially, each small object might cause some waste within the
> > original PAGE_SIZE. Now, with 4 * PAGE_SIZE, there could be a
> > single instance of waste. If we can manage the ratio, this could be
> > optimized?
>
> All size classes work the same and we merge size-classes with equal
> characteristics.  So in the example above
>
>                 395 13056
>                 404 13632
>
> size-classes #396-403 are merged with size-class #404.  And #404 size-class
> splits zspage into 13632-byte chunks, any smaller objects (e.g. an object
> from size-class #396 (which can be just one byte larger than #395
> objects)) takes that entire chunk and the rest of the space in the chunk
> is just padding.
>
> CHAIN_SIZE is how we find the optimal balance.  The larger the zspage
> the more likely we squeeze some space for extra objects, which otherwise
> would have been just a waste.  With large CHAIN_SIZE we also change
> characteristics of many size classes so we merge less classes and have
> more clusters.  The price, on the other hand, is more physical 0-order
> pages per zspage, which can be painful.  On all the tests I ran 8 or 10
> worked best.

Thanks very much for the explanation. We’ll gather more data on this and follow
up with you.

>
> [..]
> > > another option might be to just use a faster algorithm and then utilize
> > > post-processing (re-compression with zstd or writeback) for memory
> > > savings?
> >
> > The concern lies in power consumption
>
> But the power consumption concern is also in "decompress just one middle
> page from very large object" case, and size-classes de-fragmentation

That's why we have "[patch 4/4] mm: fall back to four small folios if mTHP
allocation fails" to address the issue of "decompressing just one middle page
from a very large object."  I assume that recompression and writeback should
also focus on large objects if the original compression involves multiple pages?

> which requires moving around lots of objects in order to form more full
> zspage and release empty zspages.  There are concerns everywhere, how

I assume the cost of defragmentation is M * N, where:
* M is the number of objects,
* N is the size of the objects.

With large objects, M is reduced to 1/4 of the original number of
objects. Although
N increases, the overall M * N becomes slightly smaller than before,
as N is just
under 4 times the size of the original objects?

> many of them are measured and analyzed and either ruled out or confirmed
> is another question.

In phone scenarios, if recompression uses zstd and the original compression
is based on lz4 with 4KB blocks, the cost to obtain zstd-compressed objects
would be:

* A: Compression of 4 × 4KB using lz4
* B: Decompression of 4 × 4KB using lz4
* C: Compression of 4 × 4KB using zstd

By leveraging the speed advantages of mTHP swap and zstd's large-block
compression,
the cost becomes:
D: Compression of 16KB using zstd

Since D is significantly smaller than C (D < C), it follows that:
D < A + B + C  ?

Thanks
Barry