Re: [PATCH v7 2/2] mm: support large folios swap-in for sync io devices

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Mon, 23 Sep 2024 09:53:45 -0700

On Mon, Sep 23, 2024 at 5:10 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Mon, Sep 23, 2024 at 11:22:30AM +0100, Usama Arif wrote:
> > On 23/09/2024 00:57, Barry Song wrote:
> > > On Thu, Sep 5, 2024 at 7:36 AM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> > >>>> On the other hand, if you read the code of zRAM, you will find zRAM has
> > >>>> exactly the same mechanism as zeromap but zRAM can even do more
> > >>>> by same_pages filled. since zRAM does the job in swapfile layer, there
> > >>>> is no this kind of consistency issue like zeromap.
> > >>>>
> > >>>> So I feel for zRAM case, we don't need zeromap at all as there are duplicated
> > >>>> efforts while I really appreciate your job which can benefit all swapfiles.
> > >>>> i mean, zRAM has the ability to check "zero"(and also non-zero but same
> > >>>> content). after zeromap checks zeromap, zRAM will check again:
> > >>>>
> > >>>
> > >>> Yes, so there is a reason for having the zeromap patches, which I have outlined
> > >>> in the coverletter.
> > >>>
> > >>> https://lore.kernel.org/all/20240627105730.3110705-1-usamaarif642@xxxxxxxxx/
> > >>>
> > >>> There are usecases where zswap/zram might not be used in production.
> > >>> We can reduce I/O and flash wear in those cases by a large amount.
> > >>>
> > >>> Also running in Meta production, we found that the number of non-zero filled
> > >>> complete pages were less than 1%, so essentially its only the zero-filled pages
> > >>> that matter.
> > >>>
> > >>> I believe after zeromap, it might be a good idea to remove the page_same_filled
> > >>> check from zram code? Its not really a problem if its kept as well as I dont
> > >>> believe any zero-filled pages should reach zram_write_page?
> > >>
> > >> I brought this up before and Sergey pointed out that zram is sometimes
> > >> used as a block device without swap, and that use case would benefit
> > >> from having this handling in zram. That being said, I have no idea how
> > >> many people care about this specific scenario.
> > >
> > > Hi Usama/Yosry,
> > >
> > > We successfully gathered page_same_filled data for zram on Android.
> > > Interestingly,
> > > our findings differ from yours on zswap.
> > >
> > > Hailong discovered that around 85-86% of the page_same_filled data
> > > consists of zeros,
> > > while about 15% are non-zero. We suspect that on Android or similar
> > > systems, some
> > > graphics or media data might be duplicated at times, such as a red
> > > block displayed
> > > on the screen.
> > >
> > > Does this suggest that page_same_filled could still provide some
> > > benefits in zram
> > > cases?
> >
> > Hi Barry,
> >
> > Thanks for the data, its very interesting to know this from mobile side.
> > Eventhough its not 99% that I observed, I do feel 85% is still quite high.
>
> Would it be possible to benchmark Android with zram only optimizing
> zero pages?
>
> diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
> index c3d245617083..f6ded491fd00 100644
> --- a/drivers/block/zram/zram_drv.c
> +++ b/drivers/block/zram/zram_drv.c
> @@ -211,6 +211,9 @@ static bool page_same_filled(void *ptr, unsigned long *element)
>         page = (unsigned long *)ptr;
>         val = page[0];
>
> +       if (val)
> +               return false;
> +
>         if (val != page[last_pos])
>                 return false;
>
> My take is that, if this is worth optimizing for, then it's probably
> worth optimizing for in the generic swap layer too. It makes sense to
> maintain feature parity if we one day want Android to work with zswap.

I am not sure if it's worth it for the generic swap layer. We would
need to store 8 bytes per swap entry to maintain feature parity,
that's about 0.2% of swap capacity as consistent memory overhead. Swap
capacity is usually higher than the actual size of swapped data.

IIUC the data you gathered from prod showed that 1% of same filled
pages were non-zero, and 10-20% of swapped data was same filled [1].
That means that ~0.15% of swapped data is non-zero same-filled.

With zswap, assuming a 3:1 compression ratio, we'd be paying 0.2% of
swap capacity to save around 0.05% of swapped data in memory. I think
it may be worse because that compression ratio may be higher for
same-filled data.

With SSD swap, I am not sure if 0.15% reduction in IO is worth the
memory overhead.

OTOH, zram keeps track of the same-filled value for free because it
overlays the zsmalloc handle (like zswap used to do). So the same
tradeoffs do not apply.

Barry mentioned that 15% of same-filled pages are non-zero in their
Android experiment, but what % of total swapped memory is this, and
how much space does it take if we just compress it instead?

IOW, how much memory is this really saving with zram (especially that
metadata is statically allocated)?