Re: [External] Re: [PATCH] mm:zswap: fix zswap entry reclamation failure in two scenarios

Nhat Pham <nphamcs@xxxxxxxxx> · Sat, 18 Nov 2023 13:43:52 -0500

On Fri, Nov 17, 2023 at 8:46 PM Zhongkun He
<hezhongkun.hzk@xxxxxxxxxxxxx> wrote:
>
> Hi Chris, thanks for your time.
>
> >
> > On Fri, Nov 17, 2023 at 1:56 AM Zhongkun He
> > <hezhongkun.hzk@xxxxxxxxxxxxx> wrote:
> > > Hi Chris, thanks for your feedback.  I have the same concerns,
> > > maybe we should just move the zswap_invalidate() out of batches,
> > > as Yosry mentioned above.
> >
> > As I replied in the previous email, I just want to understand the
> > other side effects of the change better.
> >
> > To me, this patching is actually freeing the memory that does not
> > require actual page IO write from zswap. Which means the memory is
> > from some kind of cache. It would be interesting if we can not
> > complicate the write back path further. Instead, we can drop those
> > memories from the different cache if needed. I assume those caches are
> > doing something useful in the common case. If not, we should have a
> > patch to remove these caches instead.  Not sure how big a mess it will
> > be to implement separate the write and drop caches.
> >
> > While you are here, I have some questions for you.
> >
> > Can you help me understand how much memory you can free from this
> > patch? For example, are we talking about a few pages or a few GB?
> >
> > Where does the freed memory come from?
> > If the memory comes from zswap entry struct. Due to the slab allocator
> > fragmentation. It would take a lot of zswap entries to have meaningful
> > memory reclaimed from the slab allocator.
> >
> > If the memory comes from the swap cached pages, that would be much
> > more meaningful. But that is not what this patch is doing, right?
> >
> > Chris
>
> It's my bad for putting two cases together. The memory released in both
> cases comes from zswap entry struct and zswap compressed page.
>
> The original intention of this patch is to solve the problem that
> shrink_work() fails to reclaim memory in two situations.
>
> For case (1),  the zswap_writeback_entry() will failed for the
> __read_swap_cache_async return NULL because the swap has been
> freed but cached in swap_slots_cache, so the memory come from
> the zswap entry struct and compressed page.
> Count = SWAP_BATCH * ncpu.
> Solution: move the zswap_invalidate() out of batches, free it once the swap
> count equal to 0.
>
> For case (2),  the zswap_writeback_entry() will failed for !page_was_allocated
> because zswap_load will have two copies of the same page in memory
>   (compressed and uncompressed) after faulting in a page from zswap when
> zswap_exclusive_loads disabled. The amount of memory is greater but depends
> on the usage.
>
> Why do we need  to release them?
> Consider this scenario,there is a lot of data cached in memory and zswap,
> hit the limit，and shrink_worker will fail. The new coming data will be written
> directly to swap due to zswap_store failure. Should we free the last one
> to store the latest one in zswap.

Shameless plug: zswap will much less likely hit the limit (global or
cgroup) with the shrinker enabled ;) It will proactively reclaim the
objects way ahead of the limit.

It comes with its own can of worms, of course - it's unlikely to work
for all workloads in its current form, but perhaps worth experimenting
with/improved upon?

>
> According to the previous discussion, the writeback is inevitable.
> So I want to make zswap_exclusive_loads_enabled the default behavior
> or make it the only way to do zswap loads. It only makes sense when
> the page is read and no longer dirty. If the page is read frequently, it
> should stay in cache rather than zswap. The benefit of doing this is
> very small, i.e. two copies of the same page in memory.
>
> Thanks again.