On Sun, Nov 19, 2023 at 1:39 AM Chris Li <chrisl@xxxxxxxxxx> wrote: > > On Sat, Nov 18, 2023 at 11:23 AM Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > > > Hmm how about this - in the future, we support the following > > options: > > > > 1. zswap.writeback == 1: no limitation to zswap writeback. > > All backing swap devices (sorted by priorities?) are fair game. > > > > 2. zswap.writeback == 0: disable all forms of zswap writeback. > > > > 3. zswap.writeback == <tiers description>: attempt to write to each > > tier, one at a time. > > We can merge the zswap.writeback as it is for now to unblock you. > > For the future. I think we should remove zswap.writeback completely. I'm a bit weary about API changes, especially changes that affect backward compatibility. Breaking existing userspace programs simply with a kernel upgrade does not sound very nice to me. (although I've heard that the eventual plan is to deprecate cgroupv1 - not sure how that is gonna proceed). Hence my attempt at creating something that can both serve the current use case, while still remaining (fairly) extensible for future ideas. > > Instead we have: > > swap.tiers == <swap_tier_list_name> > swap.tiers == "all" all available swap tiers. "zswap + swap file". > This is the default. > swap.tiers == "zswap" zswap only, no other swap file. Internally set > zswap.writeback = 0 > swap.tiers == "foo" foo is a list of swap devices it can use. You can > define your town custom swap tier list in > swap.tiers == "none" or "disabled" Not allowed to swap. swap.tiers == "none" or "disabled" means disallowing zswap as well, correct? > > "all", "zswap", "none" are reserved keywords. > "foo", "bar" etc are custom lists of swap tiers. User define custom > tier list in sys/kernel/mm/swap/tiers: > ssd:zswap,/dev/nvme01p4 > hdd:/dev/sda4,/dev/sdb4 I don't have any major argument against this. It just seems a bit heavyweight for what we need at the moment (only disabling swap-to-disk usage). I'll let other people weigh in about this of course. Johannes, how do you feel about this proposed API? > > That would define two custom tiers. "ssd" can use zswap then /dev/nvme01p4. > The exact name of the "swap.tiers" and tiers name are open to suggestions. > > > > > The first two are basically what we have for this patch. > > The last one will be added in a future patch. > > > > This is from the userspace perspective. Internally, we can modify > > memcg->writeback to be a pointer or a struct instead of this bool. > > (as you suggested). > > Internally I would suggest memcg->swaptiers, the write back name is > somewhat confusing. As your patch indicated. It has two situation: > 1. shrinking from zpool to real swapfile. The write back is appropriate here. > 2. zswap store failed (compression ratio too low, out of memory etc). > The write back is confusing here. It is more like writing through or > skip. > > > > > This way, the API remains intact and backward compatible > > (and FWIW, I think there are still a lot of values in having simple > > options for the users who have simple memory hierarchies). > > swap.tiers can be simple. For example, you can modify your patch to > "swap.tires == zswap" to > set zswap.writeback bool to 0 for now. Most of your patch is still re-usable. > I'm less concerned about internals - that is always up to changes. I'm a bit more concerned with the API we're exposing to the users. > I think we should discuss if we want to keep zswap.writeback in the > future because that would be some code undeletable and functionally > overlap with swap.tiers This is a fair point. > > Chris