On Tue, Nov 15, 2022 at 8:07 PM Yin, Fengwei <fengwei.yin@xxxxxxxxx> wrote: > > On 11/16/2022 9:38 AM, Yu Zhao wrote: > > The page reclaim isolates a batch of folios from the tail of one of > > the LRU lists and works on those folios one by one. For a suitable > > swap-backed folio, if the swap device is async, it queues that folio > > for writeback. After the page reclaim finishes an entire batch, it > > puts back the folios it queued for writeback to the head of the > > original LRU list. > > > > In the meantime, the page writeback flushes the queued folios also by > > batches. Its batching logic is independent from that of the page > > reclaim. For each of the folios it writes back, the page writeback > > calls folio_rotate_reclaimable() which tries to rotate a folio to the > > tail. > > > > folio_rotate_reclaimable() only works for a folio after the page > > reclaim has put it back. If an async swap device is fast enough, the > > page writeback can finish with that folio while the page reclaim is > > still working on the rest of the batch containing it. In this case, > > that folio will remain at the head and the page reclaim will not retry > > it before reaching there. > > > > This patch adds a retry to evict_folios(). After evict_folios() has > > finished an entire batch and before it puts back folios it cannot free > > immediately, it retries those that may have missed the rotation. > > > > Before this patch, ~60% of folios swapped to an Intel Optane missed > > folio_rotate_reclaimable(). After this patch, ~99% of missed folios > > were reclaimed upon retry. > > > > This problem affects relatively slow async swap devices like Samsung > > 980 Pro much less and does not affect sync swap devices like zram or > > zswap at all. > > > > Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation") > > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> > > --- > > mm/vmscan.c | 48 +++++++++++++++++++++++++++++++++++++----------- > > 1 file changed, 37 insertions(+), 11 deletions(-) > > > > diff --git a/mm/vmscan.c b/mm/vmscan.c > > index 04d8b88e5216..dc6ebafa0a37 100644 > > --- a/mm/vmscan.c > > +++ b/mm/vmscan.c > > @@ -4971,10 +4971,13 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap > > int scanned; > > int reclaimed; > > LIST_HEAD(list); > > + LIST_HEAD(clean); > > struct folio *folio; > > + struct folio *next; > > enum vm_event_item item; > > struct reclaim_stat stat; > > struct lru_gen_mm_walk *walk; > > + bool skip_retry = false; > > struct mem_cgroup *memcg = lruvec_memcg(lruvec); > > struct pglist_data *pgdat = lruvec_pgdat(lruvec); > > > > @@ -4991,20 +4994,37 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap > > > > if (list_empty(&list)) > > return scanned; > > - > > +retry: > > reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false); > > + sc->nr_reclaimed += reclaimed; > > > > - list_for_each_entry(folio, &list, lru) { > > - /* restore LRU_REFS_FLAGS cleared by isolate_folio() */ > > - if (folio_test_workingset(folio)) > > - folio_set_referenced(folio); > > + list_for_each_entry_safe_reverse(folio, next, &list, lru) { > > + if (!folio_evictable(folio)) { > > + list_del(&folio->lru); > > + folio_putback_lru(folio); > > + continue; > > + } > dump question: > My understanding: unevictable folios were filtered out in sort_folios. > So this is because folio could become unevictable during retry? Thanks. If a folio is mlocked, it's unevictable. A different thread can call mlock_folio() anytime, so we can see unevictable folios in sort_folios(), here before and after retry, and later in move_folios_to_lru(). In all these cases, we move mlocked folios to the (imaginary) unevictable LRU and __mlock_page() bails out early.