On Thu, 7 Dec 2023 11:24:06 -0800 Nhat Pham <nphamcs@xxxxxxxxx> wrote: > During our experiment with zswap, we sometimes observe swap IOs due to > occasional zswap store failures and writebacks-to-swap. These swapping > IOs prevent many users who cannot tolerate swapping from adopting zswap > to save memory and improve performance where possible. > > This patch adds the option to disable this behavior entirely: do not > writeback to backing swapping device when a zswap store attempt fail, > and do not write pages in the zswap pool back to the backing swap > device (both when the pool is full, and when the new zswap shrinker is > called). > > This new behavior can be opted-in/out on a per-cgroup basis via a new > cgroup file. By default, writebacks to swap device is enabled, which is > the previous behavior. Initially, writeback is enabled for the root > cgroup, and a newly created cgroup will inherit the current setting of > its parent. > > Note that this is subtly different from setting memory.swap.max to 0, as > it still allows for pages to be stored in the zswap pool (which itself > consumes swap space in its current form). > > This patch should be applied on top of the zswap shrinker series: > > https://lore.kernel.org/linux-mm/20231130194023.4102148-1-nphamcs@xxxxxxxxx/ > > as it also disables the zswap shrinker, a major source of zswap > writebacks. > > ... > > --- a/Documentation/admin-guide/mm/zswap.rst > +++ b/Documentation/admin-guide/mm/zswap.rst > @@ -153,6 +153,12 @@ attribute, e. g.:: > > Setting this parameter to 100 will disable the hysteresis. > > +Some users cannot tolerate the swapping that comes with zswap store failures > +and zswap writebacks. Swapping can be disabled entirely (without disabling > +zswap itself) on a cgroup-basis as follows: > + > + echo 0 > /sys/fs/cgroup/<cgroup-name>/memory.zswap.writeback > + This does seem to be getting down into the weeds. How would a user know (or even suspect) that these things are happening to them? Perhaps it would be helpful to tell people where to go look to determine this. Also, it would be quite helpful of the changelog were to give us some idea of how important this tunable is. What sort of throughput differences might it cause and under what circumstances?