On Sat, Nov 09, 2024 at 01:58:46PM -0500, Joshua Hahn wrote: [...] > > > > + free_huge_folio(folio); > > > > free_huge_folio() will call lruvec_stat_mod_folio() unconditionally but > > you are only calling it on success. This may underflow the metric. > > I was actually thinking about this too. I was wondering what would > make sense -- in the original draft of this patch, I had the charge > increment be called unconditionally as well. The idea was that > even though it would not make sense to have the stat incremented > when there is an error, it would eventually be corrected by > free_huge_folio's decrement. However, because there is nothing > stopping the user from checking the stat in this period, they may > temporarily see that the value is incremented in memory.stat, > even though they were not able to obtain this page. > On the alloc path, unconditionally calling lruvec_stat_mod_folio() after mem_cgroup_charge_hugetlb() but before free_huge_folio() is fine. Please note that if mem_cgroup_charge_hugetlb() has failed, lruvec_stat_mod_folio() will not be incrementing the memcg stat. The system level stats may get elevated for small time window but that is fine. > With that said, maybe it makes sense to increment unconditionally, > if free also decrements unconditionally. This race condition is > not something that will cause a huge problem for the user, > although users relying on userspace monitors for memory.stat > to handle memory management may see some problems. > > Maybe what would make the most sense is to do both > incrementing & decrementing conditionally as well. > Thank you for your feedback, I will iterate on this for the next version! > > > > +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp) > > > +{ > > > + struct mem_cgroup *memcg = get_mem_cgroup_from_current(); > > > + int ret = 0; > > > + > > > + if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() || > > > + !memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) { > > > + ret = -EOPNOTSUPP; > > > > why EOPNOTSUPP? You need to return 0 here. We do want > > lruvec_stat_mod_folio() to be called. > > In this case, I was just preserving the original code's return statements. > That is, in mem_cgroup_hugetlb_try_charge, the intended behavior > was to return -EOPNOTSUPP if any of these conditions were met. > If I understand the code correctly, calling lruvec_stat_mod_folio() > on this failure will be a noop, since either memcg doesn't account > hugetlb folios / there is no memcg / memcg is disabled. > lruvec_stat_mod_folio() does manipulate system level stats, so even if memcg accounting of hugetlb is not enabled we do want system level stats get updated.