Re: [PATCH FIX v0] mm: memcg/slab: Uncharge during kmem_cache_free_bulk()

Roman Gushchin <guro@xxxxxx> · Mon, 12 Oct 2020 17:40:07 -0700

On Mon, Oct 12, 2020 at 09:03:26AM +0530, Bharata B Rao wrote:
> On Fri, Oct 09, 2020 at 11:45:51AM -0700, Roman Gushchin wrote:
> > On Fri, Oct 09, 2020 at 11:34:23AM +0530, Bharata B Rao wrote:
> > 
> > Hi Bharata,
> > 
> > > Object cgroup charging is done for all the objects during
> > > allocation, but during freeing, uncharging ends up happening
> > > for only one object in the case of bulk allocation/freeing.
> > 
> > Yes, it's definitely a problem. Thank you for catching it!
> > 
> > I'm curious, did you find it in the wild or by looking into the code?
> 
> Found by looking at the code.
> 
> > 
> > > 
> > > Fix this by having a separate call to uncharge all the
> > > objects from kmem_cache_free_bulk() and by modifying
> > > memcg_slab_free_hook() to take care of bulk uncharging.
> > >
> > > Signed-off-by: Bharata B Rao <bharata@xxxxxxxxxxxxx>
> > 
> > Please, add:
> > 
> > Fixes: 964d4bd370d5 ("mm: memcg/slab: save obj_cgroup for non-root slab objects")
> > Cc: stable@xxxxxxxxxxxxxxx
> > 
> > > ---
> > >  mm/slab.c |  2 +-
> > >  mm/slab.h | 42 +++++++++++++++++++++++++++---------------
> > >  mm/slub.c |  3 ++-
> > >  3 files changed, 30 insertions(+), 17 deletions(-)
> > > 
> > > diff --git a/mm/slab.c b/mm/slab.c
> > > index f658e86ec8cee..5c70600d8b1cc 100644
> > > --- a/mm/slab.c
> > > +++ b/mm/slab.c
> > > @@ -3440,7 +3440,7 @@ void ___cache_free(struct kmem_cache *cachep, void *objp,
> > >  		memset(objp, 0, cachep->object_size);
> > >  	kmemleak_free_recursive(objp, cachep->flags);
> > >  	objp = cache_free_debugcheck(cachep, objp, caller);
> > > -	memcg_slab_free_hook(cachep, virt_to_head_page(objp), objp);
> > > +	memcg_slab_free_hook(cachep, &objp, 1);
> > >  
> > >  	/*
> > >  	 * Skip calling cache_free_alien() when the platform is not numa.
> > > diff --git a/mm/slab.h b/mm/slab.h
> > > index 6cc323f1313af..6dd4b702888a7 100644
> > > --- a/mm/slab.h
> > > +++ b/mm/slab.h
> > > @@ -345,30 +345,42 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> > >  	obj_cgroup_put(objcg);
> > >  }
> > >  
> > > -static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page,
> > > -					void *p)
> > > +static inline void memcg_slab_free_hook(struct kmem_cache *s_orig,
> > > +					void **p, int objects)
> > >  {
> > > +	struct kmem_cache *s;
> > >  	struct obj_cgroup *objcg;
> > > +	struct page *page;
> > >  	unsigned int off;
> > > +	int i;
> > >  
> > >  	if (!memcg_kmem_enabled())
> > >  		return;
> > >  
> > > -	if (!page_has_obj_cgroups(page))
> > > -		return;
> > > +	for (i = 0; i < objects; i++) {
> > > +		if (unlikely(!p[i]))
> > > +			continue;
> > >  
> > > -	off = obj_to_index(s, page, p);
> > > -	objcg = page_obj_cgroups(page)[off];
> > > -	page_obj_cgroups(page)[off] = NULL;
> > > +		page = virt_to_head_page(p[i]);
> > > +		if (!page_has_obj_cgroups(page))
> > > +			continue;
> > >  
> > > -	if (!objcg)
> > > -		return;
> > > +		if (!s_orig)
> > > +			s = page->slab_cache;
> > > +		else
> > > +			s = s_orig;
> > >  
> > > -	obj_cgroup_uncharge(objcg, obj_full_size(s));
> > > -	mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s),
> > > -			-obj_full_size(s));
> > > +		off = obj_to_index(s, page, p[i]);
> > > +		objcg = page_obj_cgroups(page)[off];
> > > +		if (!objcg)
> > > +			continue;
> > >  
> > > -	obj_cgroup_put(objcg);
> > > +		page_obj_cgroups(page)[off] = NULL;
> > > +		obj_cgroup_uncharge(objcg, obj_full_size(s));
> > > +		mod_objcg_state(objcg, page_pgdat(page), cache_vmstat_idx(s),
> > > +				-obj_full_size(s));
> > > +		obj_cgroup_put(objcg);
> > > +	}
> > >  }
> > >  
> > >  #else /* CONFIG_MEMCG_KMEM */
> > > @@ -406,8 +418,8 @@ static inline void memcg_slab_post_alloc_hook(struct kmem_cache *s,
> > >  {
> > >  }
> > >  
> > > -static inline void memcg_slab_free_hook(struct kmem_cache *s, struct page *page,
> > > -					void *p)
> > > +static inline void memcg_slab_free_hook(struct kmem_cache *s,
> > > +					void **p, int objects)
> > >  {
> > >  }
> > >  #endif /* CONFIG_MEMCG_KMEM */
> > > diff --git a/mm/slub.c b/mm/slub.c
> > > index 6d3574013b2f8..0cbe67f13946e 100644
> > > --- a/mm/slub.c
> > > +++ b/mm/slub.c
> > > @@ -3091,7 +3091,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s,
> > >  	struct kmem_cache_cpu *c;
> > >  	unsigned long tid;
> > >  
> > > -	memcg_slab_free_hook(s, page, head);
> > > +	memcg_slab_free_hook(s, &head, 1);
> > 
> > Hm, I wonder if it's better to teach do_slab_free() to handle the (cnt > 1) case correctly?
> 
> Possible, but we will have to walk the detached freelist there while
> here it is much easier to just walk the array of objects?
> 
> > 
> > >  redo:
> > >  	/*
> > >  	 * Determine the currently cpus per cpu slab.
> > > @@ -3253,6 +3253,7 @@ void kmem_cache_free_bulk(struct kmem_cache *s, size_t size, void **p)
> > >  	if (WARN_ON(!size))
> > >  		return;
> > >  
> > > +	memcg_slab_free_hook(s, p, size);
> > 
> > Then you don't need this change.
> > 
> > Otherwise memcg_slab_free_hook() can be called twice for the same object. It's ok from
> > accounting correctness perspective, because the first call will zero out the objcg pointer,
> > but still much better to be avoided.
> 
> Yes, for that one object there will be one additional uncharge call,
> but as you note it is handled by the !objcg check.

Got it. Thanks for the explanation!

Acked-by: Roman Gushchin <guro@xxxxxx>

Thanks!