Re: [External] Re: [PATCH] mm: zswap: fix the lack of page lru flag in zswap_writeback_entry

Yosry Ahmed <yosryahmed@xxxxxxxxxx> · Mon, 8 Jan 2024 15:12:45 -0800

On Sun, Jan 7, 2024 at 1:59 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
>
> On Sun, Jan 7, 2024 at 1:29 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote:
> >
> > On Fri, Jan 5, 2024 at 6:10 AM Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> wrote:
> > >
> > > > > There is another option here, which is not to move the page to the
> > > > > tail of the inactive
> > > > > list after end_writeback and delete the following code in
> > > > > zswap_writeback_entry(),
> > > > > which did not work properly. But the pages will not be released first.
> > > > >
> > > > > /* move it to the tail of the inactive list after end_writeback */
> > > > > SetPageReclaim(page);
> >
> >
> > Ok, so I took a look at the patch that originally introduced this
> > piece of logic:
> >
> > https://github.com/torvalds/linux/commit/b349acc76b7f65400b85abd09a5379ddd6fa5a97
> >
> > Looks like it's not for the sake of correctness, but only as a
> > best-effort optimization (reducing page scanning). If it doesn't bring
> > any benefit (i.e due to the newly allocated page still on the cpu
> > batch), then we can consider removing it. After all, if you're right
> > and it's not really doing anything here - why bother. Perhaps we can
> > replace this with some other mechanism to avoid it being scanned for
> > reclaim.
>
> For instance, we can grab the local lock, look for the folio in the
> add batch and take the folio off it, then add it to the rotate batch
> instead? Not sure if this is doable within folio_rotate_reclaimable(),
> or you'll have to manually perform this yourself (and remove the
> PG_reclaim flag set here so that folio_end_writeback() doesn't try to
> handle it).
>
> There is still some overhead with this, but at least we don't have to
> *drain everything* (which looks like what's lru_add_drain() ->
> lru_add_drain_cpu() is doing). The latter sounds expensive and
> unnecessary, whereas this is just one element addition and one element
> removal - and if IIUC the size of the per-cpu add batch is capped at
> 15, so lookup + removal (if possible) shouldn't be too expensive?
>
> Just throwing ideas out there :)

Sorry for being late to the party. It seems to me that all of this
hassle can be avoided if lru_add_fn() did the right thing in this case
and added the folio to the tail of the lru directly. I am no expert in
how the page flags work here, but it seems like we can do something
like this in lru_add_fn():

if (folio_test_reclaim(folio))
    lruvec_add_folio_tail(lruvec, folio);
else
    lruvec_add_folio(lruvec, folio);

I think the main problem with this is that PG_reclaim is an alias to
PG_readahead, so readahead pages will also go to the tail of the lru,
which is probably not good.

A more intrusive alternative is to introduce a folio_lru_add_tail()
variant that always adds pages to the tail, and optionally call that
from __read_swap_cache_async() instead of folio_lru_add() based on a
new boolean argument. The zswap code can set that boolean argument
during writeback to make sure newly allocated folios are always added
to the tail of the lru.