On Mon, Oct 28, 2024 at 01:45:48PM -0700, Yang Shi wrote: > On Fri, Oct 25, 2024 at 7:17 AM Gregory Price <gourry@xxxxxxxxxx> wrote: > > > > When numa balancing is enabled with demotion, vmscan will call > > migrate_pages when shrinking LRUs. Successful demotions will > > cause node vmstat numbers to double-decrement, leading to an > > imbalanced page count. The result is dmesg output like such: > > > > $ cat /proc/sys/vm/stat_refresh > > > > [77383.088417] vmstat_refresh: nr_isolated_anon -103212 > > [77383.088417] vmstat_refresh: nr_isolated_file -899642 > > > > This negative value may impact compaction and reclaim throttling. > > > > The double-decrement occurs in the migrate_pages path: > > > > caller to shrink_folio_list decrements the count > > shrink_folio_list > > demote_folio_list > > migrate_pages > > migrate_pages_batch > > migrate_folio_move > > migrate_folio_done > > mod_node_page_state(-ve) <- second decrement > > > > This path happens for SUCCESSFUL migrations, not failures. Typically > > callers to migrate_pages are required to handle putback/accounting for > > failures, but this is already handled in the shrink code. > > AFAIK, MGLRU doesn't dec/inc this counter, so it is not > double-decrement for MGLRU. Maybe "imbalance update" is better? > Anyway, it is just a nit. I'd suggest capturing the MGLRU case in the > commit log too. > Gotcha, so yeah saying it's an imbalance fix is more accurate. So more accurate changelog is: [PATCH] vmscan,migrate: fix page count imbalance on node stats when demoting pages When numa balancing is enabled with demotion, vmscan will call migrate_pages when shrinking LRUs. migrate_pages will decrement the the node's isolated page count, leading to an imbalanced count when invoked from (MG)LRU code. The result is dmesg output like such: $ cat /proc/sys/vm/stat_refresh [77383.088417] vmstat_refresh: nr_isolated_anon -103212 [77383.088417] vmstat_refresh: nr_isolated_file -899642 This negative value may impact compaction and reclaim throttling. The following path produces the decrement: shrink_folio_list demote_folio_list migrate_pages migrate_pages_batch migrate_folio_move migrate_folio_done mod_node_page_state(-ve) <- decrement This path happens for SUCCESSFUL migrations, not failures. Typically callers to migrate_pages are required to handle putback/accounting for failures, but this is already handled in the shrink code. When accounting for migrations, instead do not decrement the count when the migration reason is MR_DEMOTION. As of v6.11, this demotion logic is the only source of MR_DEMOTION. > > > > Signed-off-by: Gregory Price <gourry@xxxxxxxxxx> > > Fixes: 26aa2d199d6f2 ("mm/migrate: demote pages during reclaim") > > Cc: stable@xxxxxxxxxxxxxxx > > Thanks for catching this. Reviewed-by: Yang Shi <shy828301@xxxxxxxxx> > > > --- > > mm/migrate.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/migrate.c b/mm/migrate.c > > index 923ea80ba744..e3aac274cf16 100644 > > --- a/mm/migrate.c > > +++ b/mm/migrate.c > > @@ -1099,7 +1099,7 @@ static void migrate_folio_done(struct folio *src, > > * not accounted to NR_ISOLATED_*. They can be recognized > > * as __folio_test_movable > > */ > > - if (likely(!__folio_test_movable(src))) > > + if (likely(!__folio_test_movable(src)) && reason != MR_DEMOTION) > > mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + > > folio_is_file_lru(src), -folio_nr_pages(src)); > > > > -- > > 2.43.0 > >