On 11/16/2022 9:38 AM, Yu Zhao wrote: > The page reclaim isolates a batch of folios from the tail of one of > the LRU lists and works on those folios one by one. For a suitable > swap-backed folio, if the swap device is async, it queues that folio > for writeback. After the page reclaim finishes an entire batch, it > puts back the folios it queued for writeback to the head of the > original LRU list. > > In the meantime, the page writeback flushes the queued folios also by > batches. Its batching logic is independent from that of the page > reclaim. For each of the folios it writes back, the page writeback > calls folio_rotate_reclaimable() which tries to rotate a folio to the > tail. > > folio_rotate_reclaimable() only works for a folio after the page > reclaim has put it back. If an async swap device is fast enough, the > page writeback can finish with that folio while the page reclaim is > still working on the rest of the batch containing it. In this case, > that folio will remain at the head and the page reclaim will not retry > it before reaching there. > > This patch adds a retry to evict_folios(). After evict_folios() has > finished an entire batch and before it puts back folios it cannot free > immediately, it retries those that may have missed the rotation. > > Before this patch, ~60% of folios swapped to an Intel Optane missed > folio_rotate_reclaimable(). After this patch, ~99% of missed folios > were reclaimed upon retry. > > This problem affects relatively slow async swap devices like Samsung > 980 Pro much less and does not affect sync swap devices like zram or > zswap at all. > > Fixes: ac35a4902374 ("mm: multi-gen LRU: minimal implementation") > Signed-off-by: Yu Zhao <yuzhao@xxxxxxxxxx> > --- > mm/vmscan.c | 48 +++++++++++++++++++++++++++++++++++++----------- > 1 file changed, 37 insertions(+), 11 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 04d8b88e5216..dc6ebafa0a37 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -4971,10 +4971,13 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap > int scanned; > int reclaimed; > LIST_HEAD(list); > + LIST_HEAD(clean); > struct folio *folio; > + struct folio *next; > enum vm_event_item item; > struct reclaim_stat stat; > struct lru_gen_mm_walk *walk; > + bool skip_retry = false; > struct mem_cgroup *memcg = lruvec_memcg(lruvec); > struct pglist_data *pgdat = lruvec_pgdat(lruvec); > > @@ -4991,20 +4994,37 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap > > if (list_empty(&list)) > return scanned; > - > +retry: > reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false); > + sc->nr_reclaimed += reclaimed; > > - list_for_each_entry(folio, &list, lru) { > - /* restore LRU_REFS_FLAGS cleared by isolate_folio() */ > - if (folio_test_workingset(folio)) > - folio_set_referenced(folio); > + list_for_each_entry_safe_reverse(folio, next, &list, lru) { > + if (!folio_evictable(folio)) { > + list_del(&folio->lru); > + folio_putback_lru(folio); > + continue; > + } dump question: My understanding: unevictable folios were filtered out in sort_folios. So this is because folio could become unevictable during retry? Thanks. Regards Yin, Fengwei > > - /* don't add rejected pages to the oldest generation */ > if (folio_test_reclaim(folio) && > - (folio_test_dirty(folio) || folio_test_writeback(folio))) > - folio_clear_active(folio); > - else > - folio_set_active(folio); > + (folio_test_dirty(folio) || folio_test_writeback(folio))) { > + /* restore LRU_REFS_FLAGS cleared by isolate_folio() */ > + if (folio_test_workingset(folio)) > + folio_set_referenced(folio); > + continue; > + } > + > + if (skip_retry || folio_test_active(folio) || folio_test_referenced(folio) || > + folio_mapped(folio) || folio_test_locked(folio) || > + folio_test_dirty(folio) || folio_test_writeback(folio)) { > + /* don't add rejected folios to the oldest generation */ > + set_mask_bits(&folio->flags, LRU_REFS_MASK | LRU_REFS_FLAGS, > + BIT(PG_active)); > + continue; > + } > + > + /* retry folios that may have missed folio_rotate_reclaimable() */ > + list_move(&folio->lru, &clean); > + sc->nr_scanned -= folio_nr_pages(folio); > } > > spin_lock_irq(&lruvec->lru_lock); > @@ -5026,7 +5046,13 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap > mem_cgroup_uncharge_list(&list); > free_unref_page_list(&list); > > - sc->nr_reclaimed += reclaimed; > + INIT_LIST_HEAD(&list); > + list_splice_init(&clean, &list); > + > + if (!list_empty(&list)) { > + skip_retry = true; > + goto retry; > + } > > if (need_swapping && type == LRU_GEN_ANON) > *need_swapping = true;