Gregory Price <gourry@xxxxxxxxxx> writes: > On Mon, Oct 28, 2024 at 01:45:48PM -0700, Yang Shi wrote: >> On Fri, Oct 25, 2024 at 7:17 AM Gregory Price <gourry@xxxxxxxxxx> wrote: >> > >> > When numa balancing is enabled with demotion, vmscan will call >> > migrate_pages when shrinking LRUs. Successful demotions will >> > cause node vmstat numbers to double-decrement, leading to an >> > imbalanced page count. The result is dmesg output like such: >> > >> > $ cat /proc/sys/vm/stat_refresh >> > >> > [77383.088417] vmstat_refresh: nr_isolated_anon -103212 >> > [77383.088417] vmstat_refresh: nr_isolated_file -899642 >> > >> > This negative value may impact compaction and reclaim throttling. >> > >> > The double-decrement occurs in the migrate_pages path: >> > >> > caller to shrink_folio_list decrements the count >> > shrink_folio_list >> > demote_folio_list >> > migrate_pages >> > migrate_pages_batch >> > migrate_folio_move >> > migrate_folio_done >> > mod_node_page_state(-ve) <- second decrement >> > >> > This path happens for SUCCESSFUL migrations, not failures. Typically >> > callers to migrate_pages are required to handle putback/accounting for >> > failures, but this is already handled in the shrink code. >> >> AFAIK, MGLRU doesn't dec/inc this counter, so it is not >> double-decrement for MGLRU. Maybe "imbalance update" is better? >> Anyway, it is just a nit. I'd suggest capturing the MGLRU case in the >> commit log too. >> > > Gotcha, so yeah saying it's an imbalance fix is more accurate. > > So more accurate changelog is: > > > [PATCH] vmscan,migrate: fix page count imbalance on node stats when demoting pages > > When numa balancing is enabled with demotion, vmscan will call > migrate_pages when shrinking LRUs. migrate_pages will decrement the > the node's isolated page count, leading to an imbalanced count when > invoked from (MG)LRU code. > > The result is dmesg output like such: > > $ cat /proc/sys/vm/stat_refresh > > [77383.088417] vmstat_refresh: nr_isolated_anon -103212 > [77383.088417] vmstat_refresh: nr_isolated_file -899642 > This negative value may impact compaction and reclaim throttling. > > The following path produces the decrement: > > shrink_folio_list > demote_folio_list > migrate_pages > migrate_pages_batch > migrate_folio_move > migrate_folio_done > mod_node_page_state(-ve) <- decrement I think that it may be better to mention the different behavior of LRU and MGLRU. But that's not a big deal, change it again only if you think it's necessary. -- Best Regards, Huang, Ying > This path happens for SUCCESSFUL migrations, not failures. Typically > callers to migrate_pages are required to handle putback/accounting for > failures, but this is already handled in the shrink code. > > When accounting for migrations, instead do not decrement the count > when the migration reason is MR_DEMOTION. As of v6.11, this demotion > logic is the only source of MR_DEMOTION. > > >> > >> > Signed-off-by: Gregory Price <gourry@xxxxxxxxxx> >> > Fixes: 26aa2d199d6f2 ("mm/migrate: demote pages during reclaim") >> > Cc: stable@xxxxxxxxxxxxxxx >> >> Thanks for catching this. Reviewed-by: Yang Shi <shy828301@xxxxxxxxx> >> >> > --- >> > mm/migrate.c | 2 +- >> > 1 file changed, 1 insertion(+), 1 deletion(-) >> > >> > diff --git a/mm/migrate.c b/mm/migrate.c >> > index 923ea80ba744..e3aac274cf16 100644 >> > --- a/mm/migrate.c >> > +++ b/mm/migrate.c >> > @@ -1099,7 +1099,7 @@ static void migrate_folio_done(struct folio *src, >> > * not accounted to NR_ISOLATED_*. They can be recognized >> > * as __folio_test_movable >> > */ >> > - if (likely(!__folio_test_movable(src))) >> > + if (likely(!__folio_test_movable(src)) && reason != MR_DEMOTION) >> > mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON + >> > folio_is_file_lru(src), -folio_nr_pages(src)); >> > >> > -- >> > 2.43.0 >> >