Re: [External] Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios

Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> · Sat, 18 Nov 2023 09:45:55 +0800

Hi Chris, thanks for your time.

>
> On Fri, Nov 17, 2023 at 1:56 AM Zhongkun He
> <hezhongkun.hzk@xxxxxxxxxxxxx> wrote:
> > Hi Chris, thanks for your feedback.  I have the same concerns,
> > maybe we should just move the zswap_invalidate() out of batches,
> > as Yosry mentioned above.
>
> As I replied in the previous email, I just want to understand the
> other side effects of the change better.
>
> To me, this patching is actually freeing the memory that does not
> require actual page IO write from zswap. Which means the memory is
> from some kind of cache. It would be interesting if we can not
> complicate the write back path further. Instead, we can drop those
> memories from the different cache if needed. I assume those caches are
> doing something useful in the common case. If not, we should have a
> patch to remove these caches instead.  Not sure how big a mess it will
> be to implement separate the write and drop caches.
>
> While you are here, I have some questions for you.
>
> Can you help me understand how much memory you can free from this
> patch? For example, are we talking about a few pages or a few GB?
>
> Where does the freed memory come from?
> If the memory comes from zswap entry struct. Due to the slab allocator
> fragmentation. It would take a lot of zswap entries to have meaningful
> memory reclaimed from the slab allocator.
>
> If the memory comes from the swap cached pages, that would be much
> more meaningful. But that is not what this patch is doing, right?
>
> Chris

It's my bad for putting two cases together. The memory released in both
cases comes from zswap entry struct and zswap compressed page.

The original intention of this patch is to solve the problem that
shrink_work() fails to reclaim memory in two situations.

For case (1),  the zswap_writeback_entry() will failed for the
__read_swap_cache_async return NULL because the swap has been
freed but cached in swap_slots_cache, so the memory come from
the zswap entry struct and compressed page.
Count = SWAP_BATCH * ncpu.
Solution: move the zswap_invalidate() out of batches, free it once the swap
count equal to 0.

For case (2),  the zswap_writeback_entry() will failed for !page_was_allocated
because zswap_load will have two copies of the same page in memory
  (compressed and uncompressed) after faulting in a page from zswap when
zswap_exclusive_loads disabled. The amount of memory is greater but depends
on the usage.

Why do we need  to release them?
Consider this scenario,there is a lot of data cached in memory and zswap,
hit the limit，and shrink_worker will fail. The new coming data will be written
directly to swap due to zswap_store failure. Should we free the last one
to store the latest one in zswap.

According to the previous discussion, the writeback is inevitable.
So I want to make zswap_exclusive_loads_enabled the default behavior
or make it the only way to do zswap loads. It only makes sense when
the page is read and no longer dirty. If the page is read frequently, it
should stay in cache rather than zswap. The benefit of doing this is
very small, i.e. two copies of the same page in memory.

Thanks again.