On Sun, Jan 7, 2024 at 1:59 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > On Sun, Jan 7, 2024 at 1:29 PM Nhat Pham <nphamcs@xxxxxxxxx> wrote: > > > > On Fri, Jan 5, 2024 at 6:10 AM Zhongkun He <hezhongkun.hzk@xxxxxxxxxxxxx> wrote: > > > > > > > > There is another option here, which is not to move the page to the > > > > > tail of the inactive > > > > > list after end_writeback and delete the following code in > > > > > zswap_writeback_entry(), > > > > > which did not work properly. But the pages will not be released first. > > > > > > > > > > /* move it to the tail of the inactive list after end_writeback */ > > > > > SetPageReclaim(page); > > > > > > Ok, so I took a look at the patch that originally introduced this > > piece of logic: > > > > https://github.com/torvalds/linux/commit/b349acc76b7f65400b85abd09a5379ddd6fa5a97 > > > > Looks like it's not for the sake of correctness, but only as a > > best-effort optimization (reducing page scanning). If it doesn't bring > > any benefit (i.e due to the newly allocated page still on the cpu > > batch), then we can consider removing it. After all, if you're right > > and it's not really doing anything here - why bother. Perhaps we can > > replace this with some other mechanism to avoid it being scanned for > > reclaim. > > For instance, we can grab the local lock, look for the folio in the > add batch and take the folio off it, then add it to the rotate batch > instead? Not sure if this is doable within folio_rotate_reclaimable(), > or you'll have to manually perform this yourself (and remove the > PG_reclaim flag set here so that folio_end_writeback() doesn't try to > handle it). > > There is still some overhead with this, but at least we don't have to > *drain everything* (which looks like what's lru_add_drain() -> > lru_add_drain_cpu() is doing). The latter sounds expensive and > unnecessary, whereas this is just one element addition and one element > removal - and if IIUC the size of the per-cpu add batch is capped at > 15, so lookup + removal (if possible) shouldn't be too expensive? > > Just throwing ideas out there :) Sorry for being late to the party. It seems to me that all of this hassle can be avoided if lru_add_fn() did the right thing in this case and added the folio to the tail of the lru directly. I am no expert in how the page flags work here, but it seems like we can do something like this in lru_add_fn(): if (folio_test_reclaim(folio)) lruvec_add_folio_tail(lruvec, folio); else lruvec_add_folio(lruvec, folio); I think the main problem with this is that PG_reclaim is an alias to PG_readahead, so readahead pages will also go to the tail of the lru, which is probably not good. A more intrusive alternative is to introduce a folio_lru_add_tail() variant that always adds pages to the tail, and optionally call that from __read_swap_cache_async() instead of folio_lru_add() based on a new boolean argument. The zswap code can set that boolean argument during writeback to make sure newly allocated folios are always added to the tail of the lru.