On Mon, Oct 16, 2023 at 5:58 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > On Mon, Oct 16, 2023 at 5:35 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > > > Currently, when a zswap store attempt fails, the page is immediately > > swapped out. This could happen for a variety of reasons. For instance, > > the compression algorithm could fail (such as when the data is not > > compressible), or the backend allocator might not be able to find a > > suitable slot for the compressed page. If these pages are needed > > later on, users will incur IOs from swapins. > > > > This issue prevents the adoption of zswap for potential users who > > cannot tolerate the latency associated with swapping. In many cases, > > these IOs are avoidable if we just keep in memory the pages that zswap > > fail to store. > > > > This patch series add two new features for zswap that will alleviate > > the risk of swapping: > > > > a) When a store attempt fail, keep the page untouched in memory > > instead of swapping it out. > > What about writeback when the zswap limit is hit? I understand the > problem, but I am wondering if this is the correct way of fixing it. > We really need to make zswap work without a backing swapfile, which I > think is the correct way to fix all these problems. I was working on > that, but unfortunately I had to pivot to something else before I had > something that was working. > > At Google, we have "ghost" swapfiles that we use just to use zswap > without a swapfile. They are sparse files, and we have internal kernel > patches to flag them and never try to actually write to them. > > I am not sure how many bandaids we can afford before doing the right > thing. I understand it's a much larger surgery, perhaps there is a way > to get a short-term fix that is also a step towards the final state we > want to reach instead? Regarding the writeback - I'll make sure to also short-circuit writeback when the bypass_swap option is enabled in v2 :) I'll probably send out v2 after I absolutely agree that we must decouple zswap and swap (and would be happy to help out in any capacity I could - we have heard similar concerns/complaints about swap wastage from internal parties as well). However, as Johannes has pointed out, this feature still has its place, given our already existing swapfile deployments. I do agree that a global knob is insufficient tho. I'll add a per-cgroup knob in v2 so that we can enable/disable this feature on a per-workload basis. > > > > > b) If the store attempt fails at the compression step, allow the page > > to be stored in its uncompressed form in the zswap pool. This maintains > > the LRU ordering of pages, which will be helpful for accurate > > memory reclaim (zswap writeback in particular). > > This is dangerous. Johannes and I discussed this before. This means > that reclaim can end up allocating more memory instead of freeing. > Allocations made in the reclaim path are made under the assumption > that we will eventually free memory. In this case, we won't. In the > worst case scenario, reclaim can leave the system/memcg in a worse > state than before it started. > > Perhaps there is a way we can do this without allocating a zswap entry? > > I thought before about having a special list_head that allows us to > use the lower bits of the pointers as markers, similar to the xarray. > The markers can be used to place different objects on the same list. > We can have a list that is a mixture of struct page and struct > zswap_entry. I never pursued this idea, and I am sure someone will > scream at me for suggesting it. Maybe there is a less convoluted way > to keep the LRU ordering intact without allocating memory on the > reclaim path. Hmm yeah you're right about these concerns. That seems like a lot more involved than what I envisioned initially. Let's put this aside for now. I'll just send the first patch in v2, and we can work on + discuss more about uncompressed pages storing later on. > > > > > These features could be enabled independently via two new zswap module > > parameters. > > > > Nhat Pham (2): > > swap: allows swap bypassing on zswap store failure > > zswap: store uncompressed pages when compression algorithm fails > > > > Documentation/admin-guide/mm/zswap.rst | 16 +++++++ > > include/linux/zswap.h | 9 ++++ > > mm/page_io.c | 6 +++ > > mm/shmem.c | 8 +++- > > mm/zswap.c | 64 +++++++++++++++++++++++--- > > 5 files changed, 95 insertions(+), 8 deletions(-) > > > > -- > > 2.34.1