On Mon, Feb 03, 2020 at 03:46:27PM -0500, Johannes Weiner wrote: > On Mon, Feb 03, 2020 at 10:34:52AM -0800, Roman Gushchin wrote: > > On Mon, Feb 03, 2020 at 01:27:56PM -0500, Johannes Weiner wrote: > > > On Mon, Jan 27, 2020 at 09:34:41AM -0800, Roman Gushchin wrote: > > > > Allocate and release memory to store obj_cgroup pointers for each > > > > non-root slab page. Reuse page->mem_cgroup pointer to store a pointer > > > > to the allocated space. > > > > > > > > To distinguish between obj_cgroups and memcg pointers in case > > > > when it's not obvious which one is used (as in page_cgroup_ino()), > > > > let's always set the lowest bit in the obj_cgroup case. > > > > > > > > Signed-off-by: Roman Gushchin <guro@xxxxxx> > > > > --- > > > > include/linux/mm.h | 25 ++++++++++++++++++-- > > > > include/linux/mm_types.h | 5 +++- > > > > mm/memcontrol.c | 5 ++-- > > > > mm/slab.c | 3 ++- > > > > mm/slab.h | 51 +++++++++++++++++++++++++++++++++++++++- > > > > mm/slub.c | 2 +- > > > > 6 files changed, 83 insertions(+), 8 deletions(-) > > > > > > > > diff --git a/include/linux/mm.h b/include/linux/mm.h > > > > index 080f8ac8bfb7..65224becc4ca 100644 > > > > --- a/include/linux/mm.h > > > > +++ b/include/linux/mm.h > > > > @@ -1264,12 +1264,33 @@ static inline void set_page_links(struct page *page, enum zone_type zone, > > > > #ifdef CONFIG_MEMCG > > > > static inline struct mem_cgroup *page_memcg(struct page *page) > > > > { > > > > - return page->mem_cgroup; > > > > + struct mem_cgroup *memcg = page->mem_cgroup; > > > > + > > > > + /* > > > > + * The lowest bit set means that memcg isn't a valid memcg pointer, > > > > + * but a obj_cgroups pointer. In this case the page is shared and > > > > + * isn't charged to any specific memory cgroup. Return NULL. > > > > + */ > > > > + if ((unsigned long) memcg & 0x1UL) > > > > + memcg = NULL; > > > > + > > > > + return memcg; > > > > > > That should really WARN instead of silently returning NULL. Which > > > callsite optimistically asks a page's cgroup when it has no idea > > > whether that page is actually a userpage or not? > > > > For instance, look at page_cgroup_ino() called from the > > reading /proc/kpageflags. > > But that checks PageSlab() and implements memcg_from_slab_page() to > handle that case properly. And that's what we expect all callsites to > do: make sure that the question asked actually makes sense, instead of > having the interface paper over bogus requests. > > If that function is completely racy and PageSlab isn't stable, then it > should really just open-code the lookup, rather than require weakening > the interface for everybody else. Why though? Another example: process stack can be depending on the machine config and platform a vmalloc allocation, a slab allocation or a "high-order slab allocation", which is executed by the page allocator directly. It's kinda nice to have a function that hides accounting details and returns a valid memcg pointer for any kind of objects. To me it seems to be a valid question: for a given kernel object give me a pointer to the memory cgroup. Why it's weakening? Moreover, open-coding of this question leads to bugs like one fixed by ec9f02384f60 ("mm: workingset: fix vmstat counters for shadow nodes"). > > > > > static inline struct mem_cgroup *page_memcg_rcu(struct page *page) > > > > { > > > > + struct mem_cgroup *memcg = READ_ONCE(page->mem_cgroup); > > > > + > > > > WARN_ON_ONCE(!rcu_read_lock_held()); > > > > - return READ_ONCE(page->mem_cgroup); > > > > + > > > > + /* > > > > + * The lowest bit set means that memcg isn't a valid memcg pointer, > > > > + * but a obj_cgroups pointer. In this case the page is shared and > > > > + * isn't charged to any specific memory cgroup. Return NULL. > > > > + */ > > > > + if ((unsigned long) memcg & 0x1UL) > > > > + memcg = NULL; > > > > + > > > > + return memcg; > > > > > > Same here. > > > > > > > } > > > > #else > > > > static inline struct mem_cgroup *page_memcg(struct page *page) > > > > diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h > > > > index 270aa8fd2800..5102f00f3336 100644 > > > > --- a/include/linux/mm_types.h > > > > +++ b/include/linux/mm_types.h > > > > @@ -198,7 +198,10 @@ struct page { > > > > atomic_t _refcount; > > > > > > > > #ifdef CONFIG_MEMCG > > > > - struct mem_cgroup *mem_cgroup; > > > > + union { > > > > + struct mem_cgroup *mem_cgroup; > > > > + struct obj_cgroup **obj_cgroups; > > > > + }; > > > > > > Since you need the casts in both cases anyway, it's safer (and > > > simpler) to do > > > > > > unsigned long mem_cgroup; > > > > > > to prevent accidental direct derefs in future code. > > > > Agree. Maybe even mem_cgroup_data? > > Personally, I don't think the suffix adds much. The type makes it so > the compiler catches any accidental use, and access is very > centralized so greppability doesn't matter much.