On 14 Mar 2025, at 17:49, David Hildenbrand wrote: > On 14.03.25 14:36, Matthew Wilcox (Oracle) wrote: >> Folios always use memcg_data to refer to the mem_cgroup while pages >> allocated with GFP_ACCOUNT have a pointer to the obj_cgroup. Since the >> caller already knows what it has, split the function into two and then >> we don't need to check. >> >> Move the assignment of split folio memcg_data to the point where we set >> up the other parts of the new folio. That leaves folio_split_memcg_refs() >> just handling the memcg accounting. >> >> Signed-off-by: Matthew Wilcox (Oracle) <willy@xxxxxxxxxxxxx> >> Acked-by: Johannes Weiner <hannes@xxxxxxxxxxx> >> Acked-by: Shakeel Butt <shakeel.butt@xxxxxxxxx> >> Acked-by: Zi Yan <ziy@xxxxxxxxxx> >> --- >> include/linux/memcontrol.h | 7 +++++++ >> mm/huge_memory.c | 16 ++++------------ >> mm/memcontrol.c | 17 +++++++++++++---- >> 3 files changed, 24 insertions(+), 16 deletions(-) >> >> diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h >> index 57664e2a8fb7..d090089c5497 100644 >> --- a/include/linux/memcontrol.h >> +++ b/include/linux/memcontrol.h >> @@ -1039,6 +1039,8 @@ static inline void memcg_memory_event_mm(struct mm_struct *mm, >> } >> void split_page_memcg(struct page *head, int old_order, int new_order); >> +void folio_split_memcg_refs(struct folio *folio, unsigned old_order, >> + unsigned new_order); >> static inline u64 cgroup_id_from_mm(struct mm_struct *mm) >> { >> @@ -1463,6 +1465,11 @@ static inline void split_page_memcg(struct page *head, int old_order, int new_or >> { >> } >> +static inline void folio_split_memcg_refs(struct folio *folio, >> + unsigned old_order, unsigned new_order) >> +{ >> +} >> + >> static inline u64 cgroup_id_from_mm(struct mm_struct *mm) >> { >> return 0; >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index 14b1963898a7..3e5ecc8f3d13 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -3394,6 +3394,9 @@ static void __split_folio_to_order(struct folio *folio, int old_order, >> folio_set_young(new_folio); >> if (folio_test_idle(folio)) >> folio_set_idle(new_folio); >> +#ifdef CONFIG_MEMCG >> + new_folio->memcg_data = folio->memcg_data; >> +#endif >> folio_xchg_last_cpupid(new_folio, folio_last_cpupid(folio)); >> } >> @@ -3525,18 +3528,7 @@ static int __split_unmapped_folio(struct folio *folio, int new_order, >> } >> } >> - /* >> - * Reset any memcg data overlay in the tail pages. >> - * folio_nr_pages() is unreliable until prep_compound_page() >> - * was called again. >> - */ >> -#ifdef NR_PAGES_IN_LARGE_FOLIO >> - folio->_nr_pages = 0; >> -#endif > > > I remember that we could trigger a warning without that, but I don't immediately find where that warning was. IIRC, if we'd split to order-0, page[1] would have indicated that it had a memcg set, and something bailed out. > > Maybe Zi Yan recalls where that check fired. The error I encountered is different. When I rebase my folio_split() on top of David’s mapcount patchset, my original patch used folio_nr_pages() after memcg split. Since memcg overlays with _nr_pages, when splitting to order-0, folio->_nr_page is overwritten with memcg_data, causing folio_nr_pages() to return a bogus value. With Matthew’s this patch, memcg_data of page[1] is written inside __split_folio_to_order(), so in theory __split_folio_to_order() can call folio_nr_pages() like my original patch. For folio->_nr_pages = 0, I suppose it is trying to suppress any page[1]->memcg_data != NULL check in the following code. But I could not find any. > > In any case, if that warning no longer fires this is a very nice cleanup! Yeah, if we see any warning on memcg later, we know how to fix it. :) Best Regards, Yan, Zi