Re: [PATCH RFC 2/2] zram: support compression at the granularity of multi-pages

Barry Song <21cnbao@xxxxxxxxx> · Thu, 7 Nov 2024 23:31:17 +1300

On Thu, Nov 7, 2024 at 11:25 PM Barry Song <21cnbao@xxxxxxxxx> wrote:
>
> On Thu, Nov 7, 2024 at 5:23 AM Usama Arif <usamaarif642@xxxxxxxxx> wrote:
> >
> >
> >
> > On 22/10/2024 00:28, Barry Song wrote:
> > >> From: Tangquan Zheng <zhengtangquan@xxxxxxxx>
> > >>
> > >> +static int zram_bvec_write_multi_pages(struct zram *zram, struct bio_vec *bvec,
> > >> +                       u32 index, int offset, struct bio *bio)
> > >> +{
> > >> +    if (is_multi_pages_partial_io(bvec))
> > >> +            return zram_bvec_write_multi_pages_partial(zram, bvec, index, offset, bio);
> > >> +    return zram_write_page(zram, bvec->bv_page, index);
> > >> +}
> > >> +
> >
> > Hi Barry,
> >
> > I started reviewing this series just to get a better idea if we can do something
> > similar for zswap. I haven't looked at zram code before so this might be a basic
> > question:
> > How would you end up in zram_bvec_write_multi_pages_partial if using zram for swap?
>
> Hi Usama,
>
> There’s a corner case where, for instance, a 32KiB mTHP is swapped
> out. Then, if userspace
> performs a MADV_DONTNEED on the 0~16KiB portion of this original mTHP,
> it now consists
> of 8 swap entries(mTHP has been released and unmapped). With
> swap0-swap3 released
> due to DONTNEED, they become available for reallocation, and other
> folios may be swapped
> out to those entries. Then it is a combination of the new smaller
> folios with the original 32KiB
> mTHP.

Sorry, I forgot to mention that the assumption is ZSMALLOC_MULTI_PAGES_ORDER=3,
so data is compressed in 32KiB blocks.

With Chris' and Kairui's new swap optimization, this should be minor,
as each cluster has
its own order. However, I recall that order-0 can still steal swap
slots from other orders'
clusters when swap space is limited by scanning all slots? Please
correct me if I'm
wrong, Kairui and Chris.

>
> >
> > We only swapout whole folios. If ZCOMP_MULTI_PAGES_SIZE=64K, any folio smaller
> > than 64K will end up in zram_bio_write_page. Folios greater than or equal to 64K
> > would be dispatched by zram_bio_write_multi_pages to zram_bvec_write_multi_pages
> > in 64K chunks. So for e.g. 128K folio would end up calling zram_bvec_write_multi_pages
> > twice.
>
> In v2, I changed the default order to 2, allowing all anonymous mTHP
> to benefit from this
> feature.
>
> >
> > Or is this for the case when you are using zram not for swap? In that case, I probably
> > dont need to consider zram_bvec_write_multi_pages_partial write case for zswap.
> >
> > Thanks,
> > Usama
>

Thanks
barry