Re: [PATCH v2 1/1] memcg/hugetlb: Adding hugeTLB counters to memcg

Roman Gushchin <roman.gushchin@xxxxxxxxx> · Thu, 24 Oct 2024 16:57:29 +0000

On Wed, Oct 23, 2024 at 02:17:27PM -0700, Yosry Ahmed wrote:
> On Wed, Oct 23, 2024 at 2:15 PM Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> >
> > Hi Joshua,
> >
> > On Wed, Oct 23, 2024 at 1:34 PM Joshua Hahn <joshua.hahnjy@xxxxxxxxx> wrote:
> > >
> > > Changelog
> > > v2:
> > >   * Enables the feature only if memcg accounts for hugeTLB usage
> > >   * Moves the counter from memcg_stat_item to node_stat_item
> > >   * Expands on motivation & justification in commitlog
> > >   * Added Suggested-by: Nhat Pham
> >
> > Changelogs usually go at the end, after ---, not as part of the commit
> > log itself.
> >
> > >
> > > This patch introduces a new counter to memory.stat that tracks hugeTLB
> > > usage, only if hugeTLB accounting is done to memory.current. This
> > > feature is enabled the same way hugeTLB accounting is enabled, via
> > > the memory_hugetlb_accounting mount flag for cgroupsv2.
> > >
> > > 1. Why is this patch necessary?
> > > Currently, memcg hugeTLB accounting is an opt-in feature [1] that adds
> > > hugeTLB usage to memory.current. However, the metric is not reported in
> > > memory.stat. Given that users often interpret memory.stat as a breakdown
> > > of the value reported in memory.current, the disparity between the two
> > > reports can be confusing. This patch solves this problem by including
> > > the metric in memory.stat as well, but only if it is also reported in
> > > memory.current (it would also be confusing if the value was reported in
> > > memory.stat, but not in memory.current)
> > >
> > > Aside from the consistentcy between the two files, we also see benefits
> >
> > consistency*
> >
> > > in observability. Userspace might be interested in the hugeTLB footprint
> > > of cgroups for many reasons. For instance, system admins might want to
> > > verify that hugeTLB usage is distributed as expected across tasks: i.e.
> > > memory-intensive tasks are using more hugeTLB pages than tasks that
> > > don't consume a lot of memory, or is seen to fault frequently. Note that
> >
> > are* seen
> >
> > > this is separate from wanting to inspect the distribution for limiting
> > > purposes (in which case, hugeTLB controller makes more sense).
> > >
> > > 2. We already have a hugeTLB controller. Why not use that?
> > > It is true that hugeTLB tracks the exact value that we want. In fact, by
> > > enabling the hugeTLB controller, we get all of the observability
> > > benefits that I mentioned above, and users can check the total hugeTLB
> > > usage, verify if it is distributed as expected, etc.
> > >
> > > With this said, there are 2 problems:
> > >   (a) They are still not reported in memory.stat, which means the
> > >       disparity between the memcg reports are still there.
> > >   (b) We cannot reasonably expect users to enable the hugeTLB controller
> > >       just for the sake of hugeTLB usage reporting, especially since
> > >       they don't have any use for hugeTLB usage enforcing [2].
> > >
> > > [1] https://lore.kernel.org/all/20231006184629.155543-1-nphamcs@xxxxxxxxx/
> > > [2] Of course, we can't make a new patch for every feature that can be
> > >     duplicated. However, since the exsting solution of enabling the
> >
> > existing*
> >
> > >     hugeTLB controller is an imperfect solution that still leaves a
> > >     discrepancy between memory.stat and memory.curent, I think that it
> > >     is reasonable to isolate the feature in this case.
> > >
> > > Suggested-by: Nhat Pham <nphamcs@xxxxxxxxx>
> > > Signed-off-by: Joshua Hahn <joshua.hahnjy@xxxxxxxxx>
> >
> > You should also CC linux-mm on such patches.
> 
> +roman.gushchin@xxxxxxxxx
> 
> CCing Roman's correct email.

Funny enough I suggested a similar functionality back to 2017:
https://lkml.iu.edu/hypermail/linux/kernel/1711.1/04891.html ,
but it was rejected at that time.
So yeah, it still sounds like a good idea to me.

As I understand, there is a new version of this patchset pending,
I'll wait for it for the review.

Thank you for looping me in!