Re: [PATCH] mm: zsmalloc: share slab caches for all zsmalloc zpools

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Fri, 7 Jun 2024 10:24:23 -0700

On Fri, Jun 7, 2024 at 9:14 AM Minchan Kim <minchan@xxxxxxxxxx> wrote:
>
> On Thu, Jun 06, 2024 at 04:03:55PM -0700, Yosry Ahmed wrote:
> > On Thu, Jun 6, 2024 at 3:36 PM Minchan Kim <minchan@xxxxxxxxxx> wrote:
> > >
> > > On Tue, Jun 04, 2024 at 05:53:40PM +0000, Yosry Ahmed wrote:
> > > > Zswap creates multiple zpools to improve concurrency. Each zsmalloc
> > > > zpool creates its own 'zs_handle' and 'zspage' slab caches. Currently we
> > > > end up with 32 slab caches of each type.
> > > >
> > > > Since each slab cache holds some free objects, we end up with a lot of
> > > > free objects distributed among the separate zpool caches. Slab caches
> > > > are designed to handle concurrent allocations by using percpu
> > > > structures, so having a single instance of each cache should be enough,
> > > > and avoids wasting more memory than needed due to fragmentation.
> > > >
> > > > Additionally, having more slab caches than needed unnecessarily slows
> > > > down code paths that iterate slab_caches.
> > > >
> > > > In the results reported by Eric in [1], the amount of unused slab memory
> > > > in these caches goes down from 242808 bytes to 29216 bytes (-88%). This
> > > > is calculated by (num_objs - active_objs) * objsize for each 'zs_handle'
> > > > and 'zspage' cache. Although this patch did not help with the allocation
> > > > failure reported by Eric with zswap + zsmalloc, I think it is still
> > > > worth merging on its own.
> > > >
> > > > [1]https://lore.kernel.org/lkml/20240604134458.3ae4396a@yea/
> > >
> > > I doubt this is the right direction.
> > >
> > > Zsmalloc is used for various purposes, each with different object
> > > lifecycles. For example, swap operations relatively involve short-lived
> > > objects, while filesystem use cases might have longer-lived objects.
> > > This mix of lifecycles could lead to fragmentation with this approach.
> >
> > Even in a swapfile, some objects can be short-lived and some objects
> > can be long-lived, and the line between swap and file systems both
> > becomes blurry with shmem/tmpfs. I don't think having separate caches
>
>
> Many allocators differentiate object lifecycles to minimize
> fragmentation. While this isn't a new concept, you argue it's irrelevant
> without a clearcut use case.
>
> > here is vital, but I am not generally familiar with the file system
> > use cases and I don't have data to prove/disprove it.
>
> The use case I had in mind was build output directories (e.g., Android).
> These consume object files in zram until the next build.
>
> Other potential scenarios involve separate zrams: one for foreground
> apps (short-term) and another for cached apps (long-term). Even
> zswap and zram could have different object lifecycles, as zswap might
> write back more aggressively.
>
> While you see no clear use cases, I disagree with dismissing this
> concept without strong justification.

I was just unaware of these use cases, as I mentioned. I didn't really
know how zram was used with file systems. Thanks for the examples :)

>
> >
> > >
> > > I believe the original problem arose when zsmalloc reduced its lock
> > > granularity from the class level to a global level. And then, Zswap went
> > > to mitigate the issue with multiple zpools, but it's essentially another
> > > bandaid on top of the existing problem, IMO.
> >
> > IIRC we reduced the granularity when we added writeback support to
> > zsmalloc, which was relatively recent. I think we have seen lock
> > contention with zsmalloc long before that. We have had a similar patch
> > internally to use multiple zpools in zswap for many years now.
> >
> > +Yu Zhao
> >
> > Yu has more historical context about this, I am hoping he will shed
> > more light about this.
> >
> > >
> > > The correct approach would be to further reduce the zsmalloc lock
> > > granularity.
> >
> > I definitely agree that the correct approach should be to fix the lock
> > contention at the source and drop zswap's usage of multiple zpools.
> > Nonetheless, I think this patch provides value in the meantime. The
> > fragmentation within the slab caches is real with zswap's use case.
> > OTOH, sharing a cache between swap and file system use cases leading
> > to fragmentation within the same slab cache is a less severe problem
> > in my opinion.
> >
> > That being said, I don't feel strongly. If you really don't like this
> > patch I am fine with dropping it.
>
> How about introducing a flag like "bool slab_merge" in zs_create_pool?
> This would allow zswap to unify slabs while others don't.

Yeah this should work. But I'll wait until we have more data and we
know whether we need to keep using multiple zpools for zswap.

I sent this patch because I thought it would be generally useful to
share caches (e.g. if we have zram and zswap on the same system),  but
given that you said it is actually preferable that the caches are
separate, it may not be. I'll wait for more data before sending any
more patches to address this.

Andrew, could you please take this out of mm-unstable?

Thanks!