From: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx> Being like legacy LRU management, direct reclaim threads could isolate over-sized folios and then be rescheduled, which could lead to unwanted generation update as well as the thrashing things like before. This commit would like to have direct_reclaim be throttled by judging the numbers of isolated and inactive folios. This patch is verified by launching 8 costmem(malloc and access 1GB VM in an 5.5GB v6.6 Android system) concurrently and got no system hang any more(adb shell recovered in 10s which hanged 100% in mainline). test script under Android14 Signed-off-by: Zhaoyang Huang <zhaoyang.huang@xxxxxxxxxx> --- v2: fix a possible unpaired spin_lock/unlock and commit message --- --- mm/vmscan.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/mm/vmscan.c b/mm/vmscan.c index 2e34de9cd0d4..13e5ed9060ad 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4481,6 +4481,7 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw int scanned; int tier = -1; DEFINE_MIN_SEQ(lruvec); + bool stalled = false; /* * Try to make the obvious choice first, and if anon and file are both @@ -4503,6 +4504,16 @@ static int isolate_folios(struct lruvec *lruvec, struct scan_control *sc, int sw else type = get_type_to_scan(lruvec, swappiness, &tier); + spin_unlock_irq(&lruvec->lru_lock); + while (unlikely(too_many_isolated(lruvec_pgdat(lruvec), type, sc))) { + if (stalled) { + spin_lock_irq(&lruvec->lru_lock); + return 0; + } + reclaim_throttle(lruvec_pgdat(lruvec), VMSCAN_THROTTLE_ISOLATED); + } + spin_lock_irq(&lruvec->lru_lock); + for (i = !swappiness; i < ANON_AND_FILE; i++) { if (tier < 0) tier = get_tier_idx(lruvec, type); @@ -4550,8 +4561,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap if (list_empty(&list)) return scanned; retry: + __mod_node_page_state(lruvec_pgdat(lruvec), NR_ISOLATED_ANON + type, scanned); reclaimed = shrink_folio_list(&list, pgdat, sc, &stat, false); sc->nr_reclaimed += reclaimed; + __mod_node_page_state(lruvec_pgdat(lruvec), NR_ISOLATED_ANON + type, -scanned); trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, scanned, reclaimed, &stat, sc->priority, type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); -- 2.25.1