On Sat, Aug 31, 2024 at 8:38 AM Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> wrote: > > On Thu, 29 Aug 2024 18:25:43 +0800 Jingxiang Zeng <jingxiangzeng.cas@xxxxxxxxx> wrote: > > > From: Zeng Jingxiang <linuszeng@xxxxxxxxxxx> > > > > Commit 14aa8b2d5c2e ("mm/mglru: don't sync disk for each aging cycle") > > removed the opportunity to wake up flushers during the MGLRU page > > reclamation process can lead to an increased likelihood of triggering > > OOM when encountering many dirty pages during reclamation on MGLRU. > > > > This leads to premature OOM if there are too many dirty pages in cgroup: > > Killed > > > > ... > > > > The flusher wake up was removed to decrease SSD wearing, but if we are > > seeing all dirty folios at the tail of an LRU, not waking up the flusher > > could lead to thrashing easily. So wake it up when a mem cgroups is > > about to OOM due to dirty caches. > > Thanks, I'll queue this for testing and review. Could people please > consider whether we should backport this into -stable kernels. > Hi Andrew, Thanks for picking this up. > > MGLRU still suffers OOM issue on latest mm tree, so the test is done > > with another fix merged [1]. > > > > Link: https://lore.kernel.org/linux-mm/CAOUHufYi9h0kz5uW3LHHS3ZrVwEq-kKp8S6N-MZUmErNAXoXmw@xxxxxxxxxxxxxx/ [1] > > This one is already queued for -stable. I didn't see this in -unstable or -stable though, is there any other repo or branch I missed? Jingxiang is referring to this fix from Yu: diff --git a/mm/vmscan.c b/mm/vmscan.c index cfa839284b92..778bf5b7ef97 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4320,7 +4320,7 @@ static bool sort_folio(struct lruvec *lruvec, struct folio *folio, struct scan_c } /* ineligible */ - if (zone > sc->reclaim_idx || skip_cma(folio, sc)) { + if (!folio_test_lru(folio) || zone > sc->reclaim_idx || skip_cma(folio, sc)) { gen = folio_inc_gen(lruvec, folio, false); list_move_tail(&folio->lru, &lrugen->folios[gen][type][zone]); return true;