Re: [LSF/MM/BPF TOPIC] Swap Abstraction / Native Zswap

Johannes Weiner <hannes@xxxxxxxxxxx> · Tue, 28 Mar 2023 10:14:46 -0400

On Tue, Mar 28, 2023 at 12:59:31AM -0700, Yosry Ahmed wrote:
> On Tue, Mar 28, 2023 at 12:01 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote:
> > Yosry Ahmed <yosryahmed@xxxxxxxxxx> writes:
> > > We also have to unnecessarily limit the size of zswap with the size of
> > > this fake swapfile.
> >
> > I guess you need to limit the size of zswap anyway, because you need to
> > decide when to start to writeback or moving to the lower tiers.
> 
> zswap has a knob to limit its size, but based on the actual memory
> usage of zswap (i.e the size of compressed pages). There is ongoing
> work as well to autotune this if I remember correctly. Having to deal
> with both the limit on compressed memory and the limited on the
> uncompressed size of swapped pages is cumbersome. Again, we already
> have this behavior today, but the initial swap_desc proposal aimed to
> avoid it.

Right.

The optimal size of the zswap pool on top of a swapfile depends on the
size and compressibility of the warm set of the workload: data that's
too cold for regular memory yet too hot for swap. This is obviously
highly dynamic, and even varies over time within individual jobs.

With this proposal, we'd have to provision a static swap map for the
highest expected offloading rate and compression ratio on every host
of a shared pool. On 256G machines that would put the fixed overhead
at a couple of hundred MB if I counted right.

Not the end of the world I guess. And I agree it would make for
simpler initial patches. OTOH, it would add more quirks to the swap
code instead of cleaning it up. And given how common compressed memory
setups are nowadays, it still feels like it's trading off too far in
favor of regular swap setups at the expense of compression.

So it wouldn't be my first preference. But it sounds workable.