Re: [PATCH 57/62] memcg: Convert object cgroups from struct page to struct slab

Johannes Weiner <hannes@xxxxxxxxxxx> · Mon, 11 Oct 2021 13:13:18 -0400

CC Roman for the slab tracking bits

On Mon, Oct 04, 2021 at 02:46:45PM +0100, Matthew Wilcox (Oracle) wrote:
> @@ -537,41 +537,41 @@ static inline bool PageMemcgKmem(struct page *page)
>  }
>  
>  /*
> - * page_objcgs - get the object cgroups vector associated with a page
> - * @page: a pointer to the page struct
> + * slab_objcgs - get the object cgroups vector associated with a page
> + * @slab: a pointer to the slab struct
>   *
> - * Returns a pointer to the object cgroups vector associated with the page,
> - * or NULL. This function assumes that the page is known to have an
> + * Returns a pointer to the object cgroups vector associated with the slab,
> + * or NULL. This function assumes that the slab is known to have an
>   * associated object cgroups vector. It's not safe to call this function
>   * against pages, which might have an associated memory cgroup: e.g.
>   * kernel stack pages.
>   */
> -static inline struct obj_cgroup **page_objcgs(struct page *page)
> +static inline struct obj_cgroup **slab_objcgs(struct slab *slab)
>  {
> -	unsigned long memcg_data = READ_ONCE(page->memcg_data);
> +	unsigned long memcg_data = READ_ONCE(slab->memcg_data);
>  
> -	VM_BUG_ON_PAGE(memcg_data && !(memcg_data & MEMCG_DATA_OBJCGS), page);
> -	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
> +	VM_BUG_ON_PAGE(memcg_data && !(memcg_data & MEMCG_DATA_OBJCGS), slab_page(slab));
> +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, slab_page(slab));
>  
>  	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);
>  }

I like this whole patch series, but I think for memcg this is a
particularly nice cleanup.

Because right now we can have user pages pointing to a memcg, random
alloc_page(GFP_ACCOUNT) pages pointing to an objcg, and slab pages
pointing to an array of objcgs - all in the same memcg_data member.

After your patch, slab->memcg_data points to an array of objcgs,
period. The only time it doesn't is when there is a bug. Once the
memcg_data member is no longer physically shared between page and
slab, we can do:

	struct slab {
		struct obj_cgroup **objcgs;
	};

and ditch the accessor function altogether.

> - * page_objcgs_check - get the object cgroups vector associated with a page
> - * @page: a pointer to the page struct
> + * slab_objcgs_check - get the object cgroups vector associated with a page
> + * @slab: a pointer to the slab struct
>   *
> - * Returns a pointer to the object cgroups vector associated with the page,
> - * or NULL. This function is safe to use if the page can be directly associated
> + * Returns a pointer to the object cgroups vector associated with the slab,
> + * or NULL. This function is safe to use if the slab can be directly associated
>   * with a memory cgroup.
>   */
> -static inline struct obj_cgroup **page_objcgs_check(struct page *page)
> +static inline struct obj_cgroup **slab_objcgs_check(struct slab *slab)
>  {
> -	unsigned long memcg_data = READ_ONCE(page->memcg_data);
> +	unsigned long memcg_data = READ_ONCE(slab->memcg_data);
>  
>  	if (!memcg_data || !(memcg_data & MEMCG_DATA_OBJCGS))
>  		return NULL;
>  
> -	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, page);
> +	VM_BUG_ON_PAGE(memcg_data & MEMCG_DATA_KMEM, slab_page(slab));
>  
>  	return (struct obj_cgroup **)(memcg_data & ~MEMCG_DATA_FLAGS_MASK);

This is a bit weird.

The function is used in one place, to check whether a random page is a
slab page. It's essentially a generic type check on the page!

After your changes, you pass a struct slab that might well be invalid
if this isn't a slab page, and you rely on the PAGE's memcg_data to
tell you whether this is the case. It works because page->memcg_data
is overlaid with slab->memcg_data, but that won't be the case if we
allocate struct slab separately.

To avoid that trap down the road, I think it would be better to keep
the *page* the ambiguous object for now, and only resolve to struct
slab after the type check. So that every time you see struct slab, you
know it's valid.

In fact, I think it would be best to just inline page_objcgs_check()
into its sole caller. It would clarify the resolution from wildcard
page to valid struct slab quite a bit:

> @@ -2819,38 +2819,39 @@ int memcg_alloc_page_obj_cgroups(struct page *page, struct kmem_cache *s,
>   */
>  struct mem_cgroup *mem_cgroup_from_obj(void *p)
>  {
> -	struct page *page;
> +	struct slab *slab;
>  
>  	if (mem_cgroup_disabled())
>  		return NULL;
>  
> -	page = virt_to_head_page(p);
> +	slab = virt_to_slab(p);
>  
>  	/*
>  	 * Slab objects are accounted individually, not per-page.
>  	 * Memcg membership data for each individual object is saved in
> -	 * the page->obj_cgroups.
> +	 * the slab->obj_cgroups.
>  	 */
> -	if (page_objcgs_check(page)) {
> +	if (slab_objcgs_check(slab)) {

I.e. do this instead:

	page = virt_to_head_page(p);

	/* object is backed by slab */
	if (page->memcg_data & MEMCG_DATA_OBJCGS) {
		struct slab *slab = (struct slab *)page;

		objcg = slab_objcgs(...)[]
		return objcg ? obj_cgroup_memcg(objcg): NULL;
	}

	/* object is backed by a regular kernel page */
	return page_memcg_check(page);

>  		struct obj_cgroup *objcg;
>  		unsigned int off;
>  
> -		off = obj_to_index(page->slab_cache, page, p);
> -		objcg = page_objcgs(page)[off];
> +		off = obj_to_index(slab->slab_cache, slab, p);
> +		objcg = slab_objcgs(slab)[off];
>  		if (objcg)
>  			return obj_cgroup_memcg(objcg);
>  
>  		return NULL;
>  	}
>  
> +	/* I am pretty sure this could just be 'return NULL' */

No, we could still be looking at a regular page that is being tracked
by memcg. People do (void *)__get_free_pages(GFP_ACCOUNT). So this
needs to stay 'return page_memcg_check()'.