On Mon, May 6, 2024 at 1:31 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 06.05.24 22:18, Yosry Ahmed wrote: > > On Mon, May 06, 2024 at 09:50:10PM +0200, David Hildenbrand wrote: > >> On 06.05.24 21:29, Yosry Ahmed wrote: > >>> Previously, all NR_VM_EVENT_ITEMS stats were maintained per-memcg, > >>> although some of those fields are not exposed anywhere. Commit > >>> 14e0f6c957e39 ("memcg: reduce memory for the lruvec and memcg stats") > >>> changed this such that we only maintain the stats we actually expose > >>> per-memcg via a translation table. > >>> > >>> Additionally, commit 514462bbe927b ("memcg: warn for unexpected events > >>> and stats") added a warning if a per-memcg stat update is attempted for > >>> a stat that is not in the translation table. The warning started firing > >>> for the NR_{FILE/SHMEM}_PMDMAPPED stat updates in the rmap code. These > >>> stats are not maintained per-memcg, and hence are not in the translation > >>> table. > >>> > >>> Do not use __lruvec_stat_mod_folio() when updating NR_FILE_PMDMAPPED and > >>> NR_SHMEM_PMDMAPPED. Use __mod_node_page_state() instead, which updates > >>> the global per-node stats only. > >>> > >>> Reported-by: syzbot+9319a4268a640e26b72b@xxxxxxxxxxxxxxxxxxxxxxxxx > >>> Closes: https://lore.kernel.org/lkml/0000000000001b9d500617c8b23c@xxxxxxxxxx > >>> Fixes: 514462bbe927 ("memcg: warn for unexpected events and stats") > >>> Acked-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> > >>> Signed-off-by: Yosry Ahmed <yosryahmed@xxxxxxxxxx> > >>> --- > >>> mm/rmap.c | 15 +++++++++------ > >>> 1 file changed, 9 insertions(+), 6 deletions(-) > >>> > >>> diff --git a/mm/rmap.c b/mm/rmap.c > >>> index 12be4241474ab..ed7f820369864 100644 > >>> --- a/mm/rmap.c > >>> +++ b/mm/rmap.c > >>> @@ -1435,13 +1435,14 @@ static __always_inline void __folio_add_file_rmap(struct folio *folio, > >>> struct page *page, int nr_pages, struct vm_area_struct *vma, > >>> enum rmap_level level) > >>> { > >>> + pg_data_t *pgdat = folio_pgdat(folio); > >>> int nr, nr_pmdmapped = 0; > >>> VM_WARN_ON_FOLIO(folio_test_anon(folio), folio); > >>> nr = __folio_add_rmap(folio, page, nr_pages, level, &nr_pmdmapped); > >>> if (nr_pmdmapped) > >>> - __lruvec_stat_mod_folio(folio, folio_test_swapbacked(folio) ? > >>> + __mod_node_page_state(pgdat, folio_test_swapbacked(folio) ? > >>> NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED, nr_pmdmapped); > >>> if (nr) > >>> __lruvec_stat_mod_folio(folio, NR_FILE_MAPPED, nr); > >>> @@ -1493,6 +1494,7 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > >>> enum rmap_level level) > >>> { > >>> atomic_t *mapped = &folio->_nr_pages_mapped; > >>> + pg_data_t *pgdat = folio_pgdat(folio); > >>> int last, nr = 0, nr_pmdmapped = 0; > >>> bool partially_mapped = false; > >>> enum node_stat_item idx; > >>> @@ -1540,13 +1542,14 @@ static __always_inline void __folio_remove_rmap(struct folio *folio, > >>> } > >>> if (nr_pmdmapped) { > >>> + /* NR_{FILE/SHMEM}_PMDMAPPED are not maintained per-memcg */ > >>> if (folio_test_anon(folio)) > >>> - idx = NR_ANON_THPS; > >>> - else if (folio_test_swapbacked(folio)) > >>> - idx = NR_SHMEM_PMDMAPPED; > >>> + __lruvec_stat_mod_folio(folio, NR_ANON_THPS, -nr_pmdmapped); > >>> else > >>> - idx = NR_FILE_PMDMAPPED; > >>> - __lruvec_stat_mod_folio(folio, idx, -nr_pmdmapped); > >>> + __mod_node_page_state(pgdat, > >> > >> folio_pgdat(folio) should fit here easily. :) > >> > >> But I would actually suggest something like the following in mm/rmap.c > > > > I am not a big fan of this. Not because I don't like the abstraction, > > but because I think it doesn't go all the way. It abstracts a very > > certain case: updating nr_pmdmapped for file folios. > > > > Right. It only removes some of the ugliness ;) I think if we do this we just add one unnecessary layer of indirection to one case. If anything people will wonder why we have a helper only for this case. Just my 2c :) > > > I think if we opt for abstracting the stats updates in mm/rmap.c, we > > should go all the way with something like the following (probably split > > as two patches: refactoring then bug fix). WDYT about the below? > > > > diff --git a/mm/rmap.c b/mm/rmap.c > > index 12be4241474ab..70d6f6309da01 100644 > > --- a/mm/rmap.c > > +++ b/mm/rmap.c > > @@ -1269,6 +1269,28 @@ static void __page_check_anon_rmap(struct folio *folio, struct page *page, > > page); > > } > > > > +static void __foio_mod_stat(struct folio *folio, int nr, int nr_pmdmapped) > > +{ > > + int idx; > > + > > + if (nr) { > > + idx = folio_test_anon(folio) ? NR_ANON_MAPPED : NR_FILE_MAPPED; > > + __lruvec_stat_mod_folio(folio, idx, nr); > > + } > > + if (nr_pmdmapped) { > > + if (folio_test_anon(folio)) { > > + idx = NR_ANON_THPS; > > + __lruvec_stat_mod_folio(folio, idx, nr_pmdmapped); > > + } else { > > + /* NR_*_PMDMAPPED are not maintained per-memcg */ > > + idx = folio_test_swapbacked(folio) ? > > + NR_SHMEM_PMDMAPPED : NR_FILE_PMDMAPPED; > > + __mod_node_page_state(folio_pgdat(folio), idx, > > + nr_pmdmapped); > > + } > > + } > > +} > > + > > I didn't suggest that, because in the _anon and _file functions we'll > end up introducing unnecessary folio_test_anon() checks that the > compiler cannot optimize out. I convinced myself that the folio_test_anon() will be #free because the struct folio should be already in the cache at this point, of course I may be delusional :) We can pass in an @anon boolean parameter, but it becomes an ugliness tradeoff at this point :) Anyway, I don't feel strongly either way. I am fine with keeping the patch as-is, the diff I proposed above, or the diff I proposed with an @anon parameter of folio_test_anon(). The only option I don't really like is adding a helper just for the file pmdmapped case. > > But at least in the removal path it's a clear win. > > -- > Cheers, > > David / dhildenb >