Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag in zswap_writeback_entry

Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> · Fri, 5 Jan 2024 22:10:06 +0800

>
> Hmm originally I was thinking of doing an (unconditional)
> lru_add_drain() outside of zswap_writeback_entry() - once in
> shrink_worker() and/or zswap_shrinker_scan(), before we write back any
> of the entries. Not sure if it would work/help here tho - haven't
> tested that idea yet.
>

The pages are allocated by __read_swap_cache_async() in
 zswap_writeback_entry() and it must be newly allocated，
not cached in swap.
Please see the code below in zswap_writeback_entry()

page = __read_swap_cache_async(swpentry, GFP_KERNEL, mpol,
                NO_INTERLEAVE_INDEX, &page_was_allocated);
    if (!page) {
        goto fail;}
    /* Found an existing page, we raced with load/swapin */
    if (!page_was_allocated) {
        put_page(page);
        ret = -EEXIST;
        goto fail;
    }

So when it comes to SetPageReclaim(page),
The page has just been allocated and is still in the percpu batch,
which has not been added to the LRU.

Therefore，lru_add_drain() did not work outside the
zswap_writeback_entry(）

> >
> > New test:
> > This patch will add the execution of folio_rotate_reclaimable(not executed
> > without this patch) and lru_add_drain,including percpu lock competition.
> > I bind a new task to allocate memory and use the same batch lock to compete
> > with the target process, on the same CPU.
> > context:
> > 1:stress --vm 1 --vm-bytes 1g    (bind to cpu0)
> > 2:stress --vm 1 --vm-bytes 5g --vm-hang 0（bind to cpu0）
> > 3:reclaim pages, and writeback 5G zswap_entry in cpu0 and node 0.
> >
> > Average time of five tests
> >
> > Base      patch            patch + compete
> > 4.947      5.0676          5.1336
> >                 +2.4%          +3.7%
> > compete means: a new stress run in cpu0 to compete with the writeback process.
> > PID USER        %CPU  %MEM     TIME+ COMMAND                         P
> >  1367 root         49.5      0.0       1:09.17 bash     （writeback）            0
> >  1737 root         49.5      2.2       0:27.46 stress      (use percpu
> > lock)    0
> >
> > around 2.4% increase in real time,including the execution of
> > folio_rotate_reclaimable(not executed without this patch) and lru_add_drain,but
> > no lock contentions.
>
> Hmm looks like the regression is still there, no?

Yes, it cannot be eliminated.

>
> >
> > around 1.3% additional  increase in real time with lock contentions on the same
> > cpu.
> >
> > There is another option here, which is not to move the page to the
> > tail of the inactive
> > list after end_writeback and delete the following code in
> > zswap_writeback_entry(),
> > which did not work properly. But the pages will not be released first.
> >
> > /* move it to the tail of the inactive list after end_writeback */
> > SetPageReclaim(page);
>
> Or only SetPageReclaim on pages on LRU?

No, all the pages are newly allocted and not on LRU.

This patch should add lru_add_drain() directly, remove the
if statement.
The purpose of writing back data is to release the page,
so I think it is necessary to fix it.

Thanks for your time, Nhat.