Re: [PATCH] mm: zswap: multiple zpool support

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Mon, 5 Jun 2023 18:56:35 -0700

On Fri, Jun 2, 2023 at 1:24 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Fri, Jun 02, 2023 at 12:14:28PM -0700, Yosry Ahmed wrote:
> > On Fri, Jun 2, 2023 at 11:34 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > >
> > > On Fri, Jun 02, 2023 at 09:59:20AM -0700, Yosry Ahmed wrote:
> > > > On Fri, Jun 2, 2023 at 9:49 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > > > > Again, what about the zswap_tree.lock and swap_info_struct.lock?
> > > > > They're the same scope unless you use multiple swap files. Would it
> > > > > make sense to tie pools to trees, so that using multiple swapfiles for
> > > > > concurrency purposes also implies this optimization?
> > > >
> > > > Yeah, using multiple swapfiles helps with those locks, but it doesn't
> > > > help with the zpool lock.
> > > >
> > > > I am reluctant to take this path because I am trying to get rid of
> > > > zswap's dependency on swapfiles to begin with, and have it act as its
> > > > own standalone swapping backend. If I am successful, then having one
> > > > zpool per zswap_tree is just a temporary fix.
> > >
> > > What about making the pools per-cpu?
> > >
> > > This would scale nicely with the machine size. And we commonly deal
> > > with for_each_cpu() loops and per-cpu data structures, so have good
> > > developer intuition about what's reasonable to squeeze into those.
> > >
> > > It would eliminate the lock contention, for everybody, right away, and
> > > without asking questions.
> > >
> > > It would open the door to all kinds of locking optimizations on top.
> >
> > The page can get swapped out on one cpu and swapped in on another, no?
> >
> > We will need to store which zpool the page is stored in in its zswap
> > entry, and potentially grab percpu locks from other cpus in the swap
> > in path. The lock contention would probably be less, but certainly not
> > eliminated.
> >
> > Did I misunderstand?
>
> Sorry, I should have been more precise.
>
> I'm saying that using NR_CPUS pools, and replacing the hash with
> smp_processor_id(), would accomplish your goal of pool concurrency.
> But it would do so with a broadly-used, well-understood scaling
> factor. We might not need a config option at all.
>
> The lock would still be there, but contention would be reduced fairly
> optimally (barring preemption) for store concurrency at least. Not
> fully eliminated due to frees and compaction, though, yes.

Yeah I think we can do that. I looked at the size of the zsmalloc pool
as an example, and it seems to be less than 1K, so having one percpu
seems okay.

There are a few things that we will need to do:
- Rework zswap_update_total_size(). We don't want to loop through all
cpus on each load/store. We can be smarter about it and inc/dec the
total zswap pool size each time we allocate or free a page in the
driver. This might need some plumbing from the drivers to zswap (or
passing a callback from zswap to the drivers).

- Update zsmalloc such that all pool share kmem caches, instead of
creating two kmem caches for zsmalloc percpu. This was a follow-up I
had in mind for multiple zpools support anyway, but I guess it's more
significant if we have NR_CPUS pools.

I was nervous about increasing the size of struct zswap_entry to store
the cpu/zpool where the entry resides, but I realized we can replace
the pointer to zswap_pool in struct zswap_entry with a pointer to
zpool, and add a zswap_pool pointer in struct zpool. This will
actually trim down the common "entry->pool->zpool" to just
"entry->zpool", and then we can replace any "entry->pool" with
"entry->zpool->pool".

@Yu Zhao, any thoughts on this? The multiple zpools support was
initially your idea (and did the initial implementation) -- so your
input is very valuable here.

>
> I'm not proposing more than that at this point. I only wrote the last
> line because already having per-cpu data structures might help with
> fast path optimizations down the line, if contention is still an
> issue. But unlikely. So it's not so important. Let's forget it.