Re: [PATCH v6] zswap: memcontrol: implement zswap writeback disabling

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Wed, 20 Dec 2023 16:24:22 -0800

On Wed, Dec 20, 2023 at 6:50 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
>
> On Wed, Dec 20, 2023 at 12:59:15AM -0800, Yosry Ahmed wrote:
> > On Tue, Dec 19, 2023 at 9:15 PM Johannes Weiner <hannes@xxxxxxxxxxx> wrote:
> > >
> > > On Mon, Dec 18, 2023 at 01:52:23PM -0800, Yosry Ahmed wrote:
> > > > > > Taking a step back from all the memory.swap.tiers vs.
> > > > > > memory.zswap.writeback discussions, I think there may be a more
> > > > > > fundamental problem here. If the zswap store failure is recurrent,
> > > > > > pages can keep going back to the LRUs and then sent back to zswap
> > > > > > eventually, only to be rejected again. For example, this can if zswap
> > > > > > is above the acceptance threshold, but could be even worse if it's the
> > > > > > allocator rejecting the page due to not compressing well enough. In
> > > > > > the latter case, the page can keep going back and forth between zswap
> > > > > > and LRUs indefinitely.
> > > > > >
> > > > > > You probably did not run into this as you're using zsmalloc, but it
> > > > > > can happen with zbud AFAICT. Even with zsmalloc, a less problematic
> > > > > > version can happen if zswap is above its acceptance threshold.
> > > > > >
> > > > > > This can cause thrashing and ineffective reclaim. We have an internal
> > > > > > implementation where we mark incompressible pages and put them on the
> > > > > > unevictable LRU when we don't have a backing swapfile (i.e. ghost
> > > > > > swapfiles), and something similar may work if writeback is disabled.
> > > > > > We need to scan such incompressible pages periodically though to
> > > > > > remove them from the unevictable LRU if they have been dirited.
> > > > >
> > > > > I'm not sure this is an actual problem.
> > > > >
> > > > > When pages get rejected, they rotate to the furthest point from the
> > > > > reclaimer - the head of the active list. We only get to them again
> > > > > after we scanned everything else.
> > > > >
> > > > > If all that's left on the LRU is unzswappable, then you'd assume that
> > > > > remainder isn't very large, and thus not a significant part of overall
> > > > > scan work. Because if it is, then there is a serious problem with the
> > > > > zswap configuration.
> > > > >
> > > > > There might be possible optimizations to determine how permanent a
> > > > > rejection is, but I'm not sure the effort is called for just
> > > > > yet. Rejections are already failure cases that screw up the LRU
> > > > > ordering, and healthy setups shouldn't have a lot of those. I don't
> > > > > think this patch adds any sort of new complications to this picture.
> > > >
> > > > We have workloads where a significant amount (maybe 20%? 30% not sure
> > > > tbh) of the memory is incompressible. Zswap is still a very viable
> > > > option for those workloads once those pages are taken out of the
> > > > picture. If those pages remain on the LRUs, they will introduce a
> > > > regression in reclaim efficiency.
> > > >
> > > > With the upstream code today, those pages go directly to the backing
> > > > store, which isn't ideal in terms of LRU ordering, but this patch
> > > > makes them stay on the LRUs, which can be harmful. I don't think we
> > > > can just assume it is okay. Whether we make those pages unevictable or
> > > > store them uncompressed in zswap, I think taking them out of the LRUs
> > > > (until they are redirtied), is the right thing to do.
> > >
> > > This is how it works with zram as well, though, and it has plenty of
> > > happy users.
> >
> > I am not sure I understand. Zram does not reject pages that do not
> > compress well, right? IIUC it acts as a block device so it cannot
> > reject pages. I feel like I am missing something.
>
> zram_write_page() can fail for various reasons - compression failure,
> zsmalloc failure, the memory limit. This results in !!bio->bi_status,
> __end_swap_bio_write redirtying the page, and vmscan rotating it.
>
> The effect is actually more pronounced with zram, because the pages
> don't get activated and thus cycle faster.
>
> What you're raising doesn't seem to be a dealbreaker in practice.

For the workloads using zram, yes, they are exclusively using zsmalloc
which can store incompressible pages anyway.

>
> > If we already want to support taking pages away from the LRUs when
> > rejected by zswap (e.g. Nhat's proposal earlier), doesn't it make
> > sense to do that first so that this patch can be useful for all
> > workloads?
>
> No.
>
> Why should users who can benefit now wait for a hypothetical future
> optimization that isn't relevant to them? And by the looks of it, is
> only relevant to a small set of specialized cases?
>
> And the optimization - should anybody actually care to write it - can
> be transparently done on top later, so that's no reason to change
> merge order, either.

We can agree to disagree here, I am not trying to block this anyway.
But let's at least document this in the commit message/docs/code
(wherever it makes sense) -- that recurrent failures (e.g.
incompressible memory) may keep going back to zswap only to get
rejected, so workloads prone to this may observe some reclaim
inefficiency.