On 10/03/23 15:09, Nhat Pham wrote: > On Tue, Oct 3, 2023 at 11:39 AM Johannes Weiner <hannes@xxxxxxxxxxx> wrote: > > On Tue, Oct 03, 2023 at 11:01:24AM -0700, Nhat Pham wrote: > > > On Tue, Oct 3, 2023 at 10:13 AM Mike Kravetz <mike.kravetz@xxxxxxxxxx> wrote: > > > > On 10/02/23 17:18, Nhat Pham wrote: > > > > > > > > IIUC, huge page usage is charged in alloc_hugetlb_folio and uncharged in > > > > free_huge_folio. During migration, huge pages are allocated via > > > > alloc_migrate_hugetlb_folio, not alloc_hugetlb_folio. So, there is no > > > > charging for the migration target page and we uncharge the source page. > > > > It looks like there will be no charge for the huge page after migration? > > > > > > > > > > Ah I see! This is a bit subtle indeed. > > > > > > For the hugetlb controller, it looks like they update the cgroup info > > > inside move_hugetlb_state(), which calls hugetlb_cgroup_migrate() > > > to transfer the hugetlb cgroup info to the destination folio. > > > > > > Perhaps we can do something analogous here. > > > > > > > If my analysis above is correct, then we may need to be careful about > > > > this accounting. We may not want both source and target pages to be > > > > charged at the same time. > > > > > > We can create a variant of mem_cgroup_migrate that does not double > > > charge, but instead just copy the mem_cgroup information to the new > > > folio, and then clear that info from the old folio. That way the memory > > > usage counters are untouched. > > > > > > Somebody with more expertise on migration should fact check me > > > of course :) > > > > The only reason mem_cgroup_migrate() double charges right now is > > because it's used by replace_page_cache_folio(). In that context, the > > isolation of the old page isn't quite as thorough as with migration, > > so it cannot transfer and uncharge directly. This goes back a long > > time: 0a31bc97c80c3fa87b32c091d9a930ac19cd0c40 > > > > If you rename the current implementation to mem_cgroup_replace_page() > > for that one caller, you can add a mem_cgroup_migrate() variant which > > is charge neutral and clears old->memcg_data. This can be used for > > regular and hugetlb page migration. Something like this (totally > > untested): > > > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c > > index a4d3282493b6..17ec45bf3653 100644 > > --- a/mm/memcontrol.c > > +++ b/mm/memcontrol.c > > @@ -7226,29 +7226,14 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new) > > if (mem_cgroup_disabled()) > > return; > > > > - /* Page cache replacement: new folio already charged? */ > > - if (folio_memcg(new)) > > - return; > > - > > memcg = folio_memcg(old); > > VM_WARN_ON_ONCE_FOLIO(!memcg, old); > > if (!memcg) > > return; > > > > - /* Force-charge the new page. The old one will be freed soon */ > > - if (!mem_cgroup_is_root(memcg)) { > > - page_counter_charge(&memcg->memory, nr_pages); > > - if (do_memsw_account()) > > - page_counter_charge(&memcg->memsw, nr_pages); > > - } > > - > > - css_get(&memcg->css); > > + /* Transfer the charge and the css ref */ > > commit_charge(new, memcg); > > - > > - local_irq_save(flags); > > - mem_cgroup_charge_statistics(memcg, nr_pages); > > - memcg_check_events(memcg, folio_nid(new)); > > - local_irq_restore(flags); > > + old->memcg_data = 0; > > } > > > > DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key); > > > > Ah, I like this. Will send a fixlet based on this :) > I was scratching my head trying to figure out why we were > doing the double charging in the first place. Thanks for the context, > Johannes! Be sure to check for code similar to this in folio_migrate_flags: void folio_migrate_flags(struct folio *newfolio, struct folio *folio) { ... if (!folio_test_hugetlb(folio)) mem_cgroup_migrate(folio, newfolio); } There are many places where hugetlb is special cased. -- Mike Kravetz