On Tue, Oct 24, 2023 at 9:09 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Mon, Oct 23, 2023 at 05:07:02PM -0700, Nhat Pham wrote: > > Since: > > > > "42c06a0e8ebe mm: kill frontswap" > > > > we no longer have a counter to tracks the number of zswap store > > failures. This makes it hard to investigate and monitor for zswap > > issues. > > > > This patch adds a global and a per-cgroup zswap store failure counter, > > as well as a dedicated debugfs counter for compression algorithm failure > > (which can happen for e.g when random data are passed to zswap). > > > > Signed-off-by: Nhat Pham <nphamcs@xxxxxxxxx> > > I agree this is an issue. > > > --- > > include/linux/vm_event_item.h | 1 + > > mm/memcontrol.c | 1 + > > mm/vmstat.c | 1 + > > mm/zswap.c | 18 ++++++++++++++---- > > 4 files changed, 17 insertions(+), 4 deletions(-) > > > > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h > > index 8abfa1240040..7b2b117b193d 100644 > > --- a/include/linux/vm_event_item.h > > +++ b/include/linux/vm_event_item.h > > @@ -145,6 +145,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > > #ifdef CONFIG_ZSWAP > > ZSWPIN, > > ZSWPOUT, > > + ZSWPOUT_FAIL, > > Would the writeback stat be sufficient to determine this? > > Hear me out. We already have pswpout that shows when we're hitting > disk swap. Right now we can't tell if this is because of a rejection > or because of writeback. With a writeback counter we could. Oh I see! It's a bit of an extra step, but I supposed (pswpout - writeback) could give us the number of zswap store failures. > > And I think we want the writeback counter anyway going forward in > order to monitor and understand the dynamic shrinker's performance. Domenico and I were talking about this, and we both agree the writeback counter is absolutely necessary - if anything, to make sure that the shrinker is not a) completely not working or b) going overboard. So it is coming as part of the shrinker regardless of this. I just didn't realize that it also solves this issue we're having too! > > Either way we go, one of the metrics needs to be derived from the > other(s). But I think subtle and not so subtle shrinker issues are > more concerning than outright configuration problems where zswap > doesn't work at all. The latter is easier to catch before or during > early deployment with simple functionality tests. > > Plus, rejections should be rare. They are now, and they should become > even more rare or cease to exist going forward. Because every time > they happen at scale, they represent problematic LRU inversions. We > have patched, have pending patches, or discussed changes to reduce > every single one of them: > > /* Store failed due to a reclaim failure after pool limit was reached */ > static u64 zswap_reject_reclaim_fail; > > With the shrinker this becomes less relevant. There was also the > proposal to disable the bypass to swap and just keep the page. The shrinker and that proposal sound like good ideas ;) > > /* Compressed page was too big for the allocator to (optimally) store */ > static u64 zswap_reject_compress_poor; > > You were working on eradicating this (with zsmalloc at least). > > /* Store failed because underlying allocator could not get memory */ > static u64 zswap_reject_alloc_fail; > /* Store failed because the entry metadata could not be allocated (rare) */ > static u64 zswap_reject_kmemcache_fail; > > These shouldn't happen at all due to PF_MEMALLOC. > > IOW, the fail counter is expected to stay zero in healthy, > well-configured systems. Rather than an MM event that needs counting, > this strikes me as something that could be a WARN down the line... > Yup, I agree that it should (mostly) be at 0. It being non-zero (especially at a higher ratio w.r.t total number of zswap store counts) is an indication of something wrong - either a bug, misconfiguration, or a very ill-compressible workload (or again a bug with the compression algorithm). A WARN might be good too, but if it's just an ill-compressible workload that might be too many WARNS :) But we can always just monitor pswpout - writeback (both globally, and on a cgroup-basis, I assume?). > I agree with adding the debugfs counter though. Then I'll send a new patch that focuses on the debugfs counter (for the compression failure). Thanks for the feedback, Johannes.