Re: [PATCH RFC v2 0/2] mTHP-friendly compression in zsmalloc and zram based on multi-pages

Barry Song <21cnbao@xxxxxxxxx> · Tue, 19 Nov 2024 09:27:46 +1300

On Mon, Nov 18, 2024 at 10:56 PM Sergey Senozhatsky
<senozhatsky@xxxxxxxxxxxx> wrote:
>
> On (24/11/12 09:31), Barry Song wrote:
> [..]
> > > Do you have any data how this would perform with the upstream kernel, i.e. without
> > > a large folio pool and the workaround and if large granularity compression is worth having
> > > without those patches?
> >
> > I’d say large granularity compression isn’t a problem, but large
> > granularity decompression
> > could be.
> >
> > The worst case would be if we swap out a large block, such as 16KB,
> > but end up swapping in
> > 4 times due to allocation failures, falling back to smaller folios. In
> > this scenario, we would need
> > to perform three redundant decompressions. I will work with Tangquan
> > to provide this data this
> > week.
>
> Well, apart from that... I sort of don't know.
>
> This seems to be exclusively for swap case (or do file-systems use
> mTHP too?) and zram/zsmalloc don't really focus on one particular
> usage scenario, pretty much all of our features can be used regardless
> of what zram is backing up - be it a swap partition or a mounted fs.
>

Yes, some filesystems also support mTHP. A simple grep
command can list them all:

fs % git grep mapping_set_large_folios
afs/inode.c:            mapping_set_large_folios(inode->i_mapping);
afs/inode.c:            mapping_set_large_folios(inode->i_mapping);
bcachefs/fs.c:  mapping_set_large_folios(inode->v.i_mapping);
erofs/inode.c:  mapping_set_large_folios(inode->i_mapping);
nfs/inode.c:                    mapping_set_large_folios(inode->i_mapping);
smb/client/inode.c:             mapping_set_large_folios(inode->i_mapping);
zonefs/super.c: mapping_set_large_folios(inode->i_mapping);

more filesystems might begin to support large mapping.

In the current implementation, only size is considered when
determining whether to apply large block compression:

static inline bool want_multi_pages_comp(struct zram *zram, struct bio *bio)
{
        u32 index = bio->bi_iter.bi_sector >> SECTORS_PER_PAGE_SHIFT;

        if (bio->bi_io_vec->bv_len >= ZCOMP_MULTI_PAGES_SIZE)
                return true;

        ...
}

If we encounter too many corner cases with filesystems (such as excessive
recompression or partial reads), we could also verify if the folio is anonymous
to return true.

For swap, we are working to get things under control. The challenging scenario
that could lead to many partial reads arises when mTHP allocation fails during
swap-in. In such cases, do_swap_page() will swap in only a single small folio,
even after decompressing the entire 16KB.

> Another thing is that I don't see how to integrate these large
> objects support with post-processig: recompression and writeback.
> Well, recompression is okay-ish, I guess, but writeback is not.
> Writeback works in PAGE_SIZE units; we get that worst case scenario
> here.  So, yeah, there are many questions.

For ZRAM writeback, my intuition is that we should write back the entire
large block (4 * PAGE_SIZE) at once. If the large block is idle or marked
as huge in ZRAM, it generally applies to the entire block. This isn't currently
implemented, likely because writeback hasn't been enabled on our phones
yet.

>
> p.s. Sorry for late reply.  I just started looking at the series and
> don't have any solid opinions yet.

Thank you for starting to review the series. Your suggestions are greatly
appreciated.

Best Regards
Barry