On Tue, Mar 28, 2023 at 7:14 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Tue, Mar 28, 2023 at 12:59:31AM -0700, Yosry Ahmed wrote: > > On Tue, Mar 28, 2023 at 12:01 AM Huang, Ying <ying.huang@xxxxxxxxx> wrote: > > > Yosry Ahmed <yosryahmed@xxxxxxxxxx> writes: > > > > We also have to unnecessarily limit the size of zswap with the size of > > > > this fake swapfile. > > > > > > I guess you need to limit the size of zswap anyway, because you need to > > > decide when to start to writeback or moving to the lower tiers. > > > > zswap has a knob to limit its size, but based on the actual memory > > usage of zswap (i.e the size of compressed pages). There is ongoing > > work as well to autotune this if I remember correctly. Having to deal > > with both the limit on compressed memory and the limited on the > > uncompressed size of swapped pages is cumbersome. Again, we already > > have this behavior today, but the initial swap_desc proposal aimed to > > avoid it. > > Right. > > The optimal size of the zswap pool on top of a swapfile depends on the > size and compressibility of the warm set of the workload: data that's > too cold for regular memory yet too hot for swap. This is obviously > highly dynamic, and even varies over time within individual jobs. > > With this proposal, we'd have to provision a static swap map for the > highest expected offloading rate and compression ratio on every host > of a shared pool. On 256G machines that would put the fixed overhead > at a couple of hundred MB if I counted right. > > Not the end of the world I guess. And I agree it would make for > simpler initial patches. OTOH, it would add more quirks to the swap > code instead of cleaning it up. And given how common compressed memory > setups are nowadays, it still feels like it's trading off too far in > favor of regular swap setups at the expense of compression. Right, I don't like adding more quirks to the swap code. I guess for Android and ChromeOS, even though they are using compressed memory, it is zram not zswap, so any extra overhead by swap_descs for normal swap setups would also affect Android -- so that's something to think about. > > So it wouldn't be my first preference. But it sounds workable. If we settle on this as a first step, perhaps to avoid any ABI changes we can have the kernel create a virtual swap device for zswap if it is enabled, without userspace interfering or having to do swapon on a sparse swapfile like we do today with ghost swapfiles at Google. We can then implement indirection logic that only supports moving pages between swap devices -- and perhaps only restrict it to only support the virtual zswap swap device as a top tier initially. The only user visible effect would be that if the user has zswap enabled and did not configure a swapfile, zswap would start compressing pages regardless, but that's what we're hoping for anyway -- I wouldn't think this is a breaking change. This also wouldn't be my first preference, but it seems like a smaller step from what we have today. As long as we don't have ABI dependencies we can always come back and change it later I suppose.