On Tue, Oct 17, 2023 at 12:03 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > On Mon, Oct 16, 2023 at 5:58 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote: > > > > On Mon, Oct 16, 2023 at 5:35 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > > > > > Currently, when a zswap store attempt fails, the page is immediately > > > swapped out. This could happen for a variety of reasons. For instance, > > > the compression algorithm could fail (such as when the data is not > > > compressible), or the backend allocator might not be able to find a > > > suitable slot for the compressed page. If these pages are needed > > > later on, users will incur IOs from swapins. > > > > > > This issue prevents the adoption of zswap for potential users who > > > cannot tolerate the latency associated with swapping. In many cases, > > > these IOs are avoidable if we just keep in memory the pages that zswap > > > fail to store. > > > > > > This patch series add two new features for zswap that will alleviate > > > the risk of swapping: > > > > > > a) When a store attempt fail, keep the page untouched in memory > > > instead of swapping it out. > > > > What about writeback when the zswap limit is hit? I understand the > > problem, but I am wondering if this is the correct way of fixing it. > > We really need to make zswap work without a backing swapfile, which I > > think is the correct way to fix all these problems. I was working on > > that, but unfortunately I had to pivot to something else before I had > > something that was working. > > > > At Google, we have "ghost" swapfiles that we use just to use zswap > > without a swapfile. They are sparse files, and we have internal kernel > > patches to flag them and never try to actually write to them. > > > > I am not sure how many bandaids we can afford before doing the right > > thing. I understand it's a much larger surgery, perhaps there is a way > > to get a short-term fix that is also a step towards the final state we > > want to reach instead? > > Regarding the writeback - I'll make sure to also short-circuit writeback > when the bypass_swap option is enabled in v2 :) I'll probably send out > v2 after ... I gather all these feedbacks. > I absolutely agree that we must decouple zswap and swap (and would > be happy to help out in any capacity I could - we have heard similar > concerns/complaints about swap wastage from internal parties as well). > > However, as Johannes has pointed out, this feature still has its place, > given our already existing swapfile deployments. I do agree that a > global knob is insufficient tho. I'll add a per-cgroup knob in v2 so that > we can enable/disable this feature on a per-workload basis. > > > > > > > > > b) If the store attempt fails at the compression step, allow the page > > > to be stored in its uncompressed form in the zswap pool. This maintains > > > the LRU ordering of pages, which will be helpful for accurate > > > memory reclaim (zswap writeback in particular). > > > > This is dangerous. Johannes and I discussed this before. This means > > that reclaim can end up allocating more memory instead of freeing. > > Allocations made in the reclaim path are made under the assumption > > that we will eventually free memory. In this case, we won't. In the > > worst case scenario, reclaim can leave the system/memcg in a worse > > state than before it started. > > > > Perhaps there is a way we can do this without allocating a zswap entry? > > > > I thought before about having a special list_head that allows us to > > use the lower bits of the pointers as markers, similar to the xarray. > > The markers can be used to place different objects on the same list. > > We can have a list that is a mixture of struct page and struct > > zswap_entry. I never pursued this idea, and I am sure someone will > > scream at me for suggesting it. Maybe there is a less convoluted way > > to keep the LRU ordering intact without allocating memory on the > > reclaim path. > > Hmm yeah you're right about these concerns. That seems like a lot more > involved than what I envisioned initially. > > Let's put this aside for now. I'll just send the first patch in v2, and we can > work on + discuss more about uncompressed pages storing later on. > > > > > > > > > These features could be enabled independently via two new zswap module > > > parameters. > > > > > > Nhat Pham (2): > > > swap: allows swap bypassing on zswap store failure > > > zswap: store uncompressed pages when compression algorithm fails > > > > > > Documentation/admin-guide/mm/zswap.rst | 16 +++++++ > > > include/linux/zswap.h | 9 ++++ > > > mm/page_io.c | 6 +++ > > > mm/shmem.c | 8 +++- > > > mm/zswap.c | 64 +++++++++++++++++++++++--- > > > 5 files changed, 95 insertions(+), 8 deletions(-) > > > > > > -- > > > 2.34.1