On Fri, Jun 2, 2023 at 1:24 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Fri, Jun 02, 2023 at 12:14:28PM -0700, Yosry Ahmed wrote: > > On Fri, Jun 2, 2023 at 11:34 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > > > > On Fri, Jun 02, 2023 at 09:59:20AM -0700, Yosry Ahmed wrote: > > > > On Fri, Jun 2, 2023 at 9:49 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > > > > Again, what about the zswap_tree.lock and swap_info_struct.lock? > > > > > They're the same scope unless you use multiple swap files. Would it > > > > > make sense to tie pools to trees, so that using multiple swapfiles for > > > > > concurrency purposes also implies this optimization? > > > > > > > > Yeah, using multiple swapfiles helps with those locks, but it doesn't > > > > help with the zpool lock. > > > > > > > > I am reluctant to take this path because I am trying to get rid of > > > > zswap's dependency on swapfiles to begin with, and have it act as its > > > > own standalone swapping backend. If I am successful, then having one > > > > zpool per zswap_tree is just a temporary fix. > > > > > > What about making the pools per-cpu? > > > > > > This would scale nicely with the machine size. And we commonly deal > > > with for_each_cpu() loops and per-cpu data structures, so have good > > > developer intuition about what's reasonable to squeeze into those. > > > > > > It would eliminate the lock contention, for everybody, right away, and > > > without asking questions. > > > > > > It would open the door to all kinds of locking optimizations on top. > > > > The page can get swapped out on one cpu and swapped in on another, no? > > > > We will need to store which zpool the page is stored in in its zswap > > entry, and potentially grab percpu locks from other cpus in the swap > > in path. The lock contention would probably be less, but certainly not > > eliminated. > > > > Did I misunderstand? > > Sorry, I should have been more precise. > > I'm saying that using NR_CPUS pools, and replacing the hash with > smp_processor_id(), would accomplish your goal of pool concurrency. > But it would do so with a broadly-used, well-understood scaling > factor. We might not need a config option at all. > > The lock would still be there, but contention would be reduced fairly > optimally (barring preemption) for store concurrency at least. Not > fully eliminated due to frees and compaction, though, yes. Yeah I think we can do that. I looked at the size of the zsmalloc pool as an example, and it seems to be less than 1K, so having one percpu seems okay. There are a few things that we will need to do: - Rework zswap_update_total_size(). We don't want to loop through all cpus on each load/store. We can be smarter about it and inc/dec the total zswap pool size each time we allocate or free a page in the driver. This might need some plumbing from the drivers to zswap (or passing a callback from zswap to the drivers). - Update zsmalloc such that all pool share kmem caches, instead of creating two kmem caches for zsmalloc percpu. This was a follow-up I had in mind for multiple zpools support anyway, but I guess it's more significant if we have NR_CPUS pools. I was nervous about increasing the size of struct zswap_entry to store the cpu/zpool where the entry resides, but I realized we can replace the pointer to zswap_pool in struct zswap_entry with a pointer to zpool, and add a zswap_pool pointer in struct zpool. This will actually trim down the common "entry->pool->zpool" to just "entry->zpool", and then we can replace any "entry->pool" with "entry->zpool->pool". @Yu Zhao, any thoughts on this? The multiple zpools support was initially your idea (and did the initial implementation) -- so your input is very valuable here. > > I'm not proposing more than that at this point. I only wrote the last > line because already having per-cpu data structures might help with > fast path optimizations down the line, if contention is still an > issue. But unlikely. So it's not so important. Let's forget it.