On Thu, Dec 23, 2021 at 09:44:26AM -0800, Shakeel Butt wrote: > On Wed, Dec 22, 2021 at 6:03 PM Roman Gushchin <guro@xxxxxx> wrote: > > > > On Tue, Dec 21, 2021 at 09:24:57PM -0800, Shakeel Butt wrote: > > > The kvmalloc* allocation functions can fallback to vmalloc allocations > > > and more often on long running machines. In addition the kernel does > > > have __GFP_ACCOUNT kvmalloc* calls. So, often on long running machines, > > > the memory.stat does not tell the complete picture which type of memory > > > is charged to the memcg. So add a per-memcg vmalloc stat. > > > > > > Signed-off-by: Shakeel Butt <shakeelb@xxxxxxxxxx> > > > > > > --- > > > Changelog since v1: > > > - page_memcg() within rcu lock as suggested by Muchun. > > > > > > Documentation/admin-guide/cgroup-v2.rst | 3 +++ > > > include/linux/memcontrol.h | 21 +++++++++++++++++++++ > > > mm/memcontrol.c | 1 + > > > mm/vmalloc.c | 5 +++++ > > > 4 files changed, 30 insertions(+) > > > > > > diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst > > > index 82c8dc91b2be..5aa368d165da 100644 > > > --- a/Documentation/admin-guide/cgroup-v2.rst > > > +++ b/Documentation/admin-guide/cgroup-v2.rst > > > @@ -1314,6 +1314,9 @@ PAGE_SIZE multiple when read back. > > > sock (npn) > > > Amount of memory used in network transmission buffers > > > > > > + vmalloc (npn) > > > + Amount of memory used for vmap backed memory. > > > + > > > > It's a bid sad that this counter will partially intersect with others > > (e.g. percpu and stack), but I don't see how it can be easily fixed. > > I checked those again. For vmap based stack we do vmalloc() without > __GFP_ACCOUNT and charge the stack page afterwards with > memcg_kmem_charge_page() interface. > > I think we do the same for percpu as well i.e. not use GFP_ACCOUNT for > underlying memory but later use objcg infrastructure to charge at > finer grain. > > So, I think at least percpu and stack should not intersect with > vmalloc per-memcg stat. Ah, ok then, thanks for checking! In general, it seems that if an allocation is backed by multiple layers of mm code (e.g. percpu on top of vmalloc), we choose one layer to do both memcg statistics and accounting. This makes total sense. Thanks!