On Tue, Mar 28, 2023 at 12:59:31AM -0700, Yosry Ahmed wrote: > On Tue, Mar 28, 2023 at 12:01 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > Yosry Ahmed <yosryahmed@xxxxxxxxxx> writes: > > > We also have to unnecessarily limit the size of zswap with the size of > > > this fake swapfile. > > > > I guess you need to limit the size of zswap anyway, because you need to > > decide when to start to writeback or moving to the lower tiers. > > zswap has a knob to limit its size, but based on the actual memory > usage of zswap (i.e the size of compressed pages). There is ongoing > work as well to autotune this if I remember correctly. Having to deal > with both the limit on compressed memory and the limited on the > uncompressed size of swapped pages is cumbersome. Again, we already > have this behavior today, but the initial swap_desc proposal aimed to > avoid it. Right. The optimal size of the zswap pool on top of a swapfile depends on the size and compressibility of the warm set of the workload: data that's too cold for regular memory yet too hot for swap. This is obviously highly dynamic, and even varies over time within individual jobs. With this proposal, we'd have to provision a static swap map for the highest expected offloading rate and compression ratio on every host of a shared pool. On 256G machines that would put the fixed overhead at a couple of hundred MB if I counted right. Not the end of the world I guess. And I agree it would make for simpler initial patches. OTOH, it would add more quirks to the swap code instead of cleaning it up. And given how common compressed memory setups are nowadays, it still feels like it's trading off too far in favor of regular swap setups at the expense of compression. So it wouldn't be my first preference. But it sounds workable.