The patch titled Subject: mm: disable top-tier fallback to reclaim on proactive reclaim has been added to the -mm mm-unstable branch. Its filename is mm-disable-top-tier-fallback-to-reclaim-on-proactive-reclaim.patch This patch will shortly appear at https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-disable-top-tier-fallback-to-reclaim-on-proactive-reclaim.patch This patch will later appear in the mm-unstable branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Before you just go and hit "reply", please: a) Consider who else should be cc'ed b) Prefer to cc a suitable mailing list as well c) Ideally: find the original patch on the mailing list and do a reply-to-all to that, adding suitable additional cc's *** Remember to use Documentation/process/submit-checklist.rst when testing your code *** The -mm tree is included into linux-next via the mm-everything branch at git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm and is updated there every 2-3 working days ------------------------------------------------------ From: Mina Almasry <almasrymina@xxxxxxxxxx> Subject: mm: disable top-tier fallback to reclaim on proactive reclaim Date: Thu, 1 Dec 2022 15:33:17 -0800 Reclaiming directly from top tier nodes breaks the aging pipeline of memory tiers. If we have a RAM -> CXL -> storage hierarchy, we should demote from RAM to CXL and from CXL to storage. If we reclaim a page from RAM, it means we 'demote' it directly from RAM to storage, bypassing potentially a huge amount of pages colder than it in CXL. However disabling reclaim from top tier nodes entirely would cause ooms in edge scenarios where lower tier memory is unreclaimable for whatever reason, e.g. memory being mlocked() or too hot to reclaim. In these cases we would rather the job run with a performance regression rather than it oom altogether. However, we can disable reclaim from top tier nodes for proactive reclaim. That reclaim is not real memory pressure, and we don't have any cause to be breaking the aging pipeline. Link: https://lkml.kernel.org/r/20221201233317.1394958-1-almasrymina@xxxxxxxxxx Signed-off-by: Mina Almasry <almasrymina@xxxxxxxxxx> Reviewed-by: "Huang, Ying" <ying.huang@xxxxxxxxx> Cc: Greg Thelen <gthelen@xxxxxxxxxx> Cc: Shakeel Butt <shakeelb@xxxxxxxxxx> Cc: Tim Chen <tim.c.chen@xxxxxxxxxxxxxxx> Cc: Wei Xu <weixugc@xxxxxxxxxx> Cc: Yang Shi <yang.shi@xxxxxxxxxxxxxxxxx> Cc: Yosry Ahmed <yosryahmed@xxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/vmscan.c | 27 ++++++++++++++++++++++++--- 1 file changed, 24 insertions(+), 3 deletions(-) --- a/mm/vmscan.c~mm-disable-top-tier-fallback-to-reclaim-on-proactive-reclaim +++ a/mm/vmscan.c @@ -2088,10 +2088,31 @@ keep: nr_reclaimed += demote_folio_list(&demote_folios, pgdat); /* Folios that could not be demoted are still in @demote_folios */ if (!list_empty(&demote_folios)) { - /* Folios which weren't demoted go back on @folio_list for retry: */ + /* + * Folios which weren't demoted go back on @folio_list. + */ list_splice_init(&demote_folios, folio_list); - do_demote_pass = false; - goto retry; + + /* + * goto retry to reclaim the undemoted folios in folio_list if + * desired. + * + * Reclaiming directly from top tier nodes is not often desired + * due to it breaking the LRU ordering: in general memory + * should be reclaimed from lower tier nodes and demoted from + * top tier nodes. + * + * However, disabling reclaim from top tier nodes entirely + * would cause ooms in edge scenarios where lower tier memory + * is unreclaimable for whatever reason, eg memory being + * mlocked or too hot to reclaim. We can disable reclaim + * from top tier nodes in proactive reclaim though as that is + * not real memory pressure. + */ + if (!sc->proactive) { + do_demote_pass = false; + goto retry; + } } pgactivate = stat->nr_activate[0] + stat->nr_activate[1]; _ Patches currently in -mm which might be from almasrymina@xxxxxxxxxx are mm-disable-top-tier-fallback-to-reclaim-on-proactive-reclaim.patch