On Thu, Sep 24, 2020 at 01:27:00PM -0700, Roman Gushchin wrote: > On Thu, Sep 24, 2020 at 03:45:08PM -0400, Johannes Weiner wrote: > > On Tue, Sep 22, 2020 at 01:36:57PM -0700, Roman Gushchin wrote: > > > Currently there are many open-coded reads and writes of the > > > page->mem_cgroup pointer, as well as a couple of read helpers, > > > which are barely used. > > > > > > It creates an obstacle on a way to reuse some bits of the pointer > > > for storing additional bits of information. In fact, we already do > > > this for slab pages, where the last bit indicates that a pointer has > > > an attached vector of objcg pointers instead of a regular memcg > > > pointer. > > > > > > This commits introduces 4 new helper functions and converts all > > > raw accesses to page->mem_cgroup to calls of these helpers: > > > struct mem_cgroup *page_mem_cgroup(struct page *page); > > > struct mem_cgroup *page_mem_cgroup_check(struct page *page); > > > void set_page_mem_cgroup(struct page *page, struct mem_cgroup *memcg); > > > void clear_page_mem_cgroup(struct page *page); > > > > Sounds reasonable to me! > > > > > page_mem_cgroup_check() is intended to be used in cases when the page > > > can be a slab page and have a memcg pointer pointing at objcg vector. > > > It does check the lowest bit, and if set, returns NULL. > > > page_mem_cgroup() contains a VM_BUG_ON_PAGE() check for the page not > > > being a slab page. So do set_page_mem_cgroup() and clear_page_mem_cgroup(). > > > > > > To make sure nobody uses a direct access, struct page's > > > mem_cgroup/obj_cgroups is converted to unsigned long memcg_data. > > > Only new helpers and a couple of slab-accounting related functions > > > access this field directly. > > > > > > page_memcg() and page_memcg_rcu() helpers defined in mm.h are removed. > > > New page_mem_cgroup() is a direct analog of page_memcg(), while > > > page_memcg_rcu() has a single call site in a small rcu-read-lock > > > section, so it's just not worth it to have a separate helper. So > > > it's replaced with page_mem_cgroup() too. > > > > page_memcg_rcu() does READ_ONCE(). We need to keep that for lockless > > accesses. > > Ok, how about page_memcg() and page_objcgs() which always do READ_ONCE()? > Because page_memcg_rcu() has only a single call site, I would prefer to > have one helper instead of two. Well, page_objcgs() needs it everywhere because the pointer changes locklessly, so that makes sense to me. For page_memcg(), I'll have to NAK that approach. It's not justified to impose ordering cost on a hundred callers that don't need it. And it muddies the serialization rules of the interface. > > > @@ -2956,8 +2935,14 @@ struct mem_cgroup *mem_cgroup_from_obj(void *p) > > > return NULL; > > > } > > > > > > - /* All other pages use page->mem_cgroup */ > > > - return page->mem_cgroup; > > > + /* > > > + * page_mem_cgroup_check() is used here, because page_has_obj_cgroups() > > > + * check above could fail because the object cgroups vector wasn't set > > > + * at that moment, but it can be set concurrently. > > > + * page_mem_cgroup_check(page) will guarantee tat a proper memory > > > + * cgroup pointer or NULL will be returned. > > > + */ > > > + return page_mem_cgroup_check(page); > > > > The code right now doesn't look quite safe. As per above, without the > > READ_ONCE the compiler might issue multiple loads and we may get a > > pointer with the low bit set. > > > > Maybe slightly off-topic, but what are "all other pages" in general? > > I don't see any callsites that ask for ownership on objects whose > > backing pages may belong to a single memcg. That wouldn't seem to make > > too much sense. Unless I'm missing something, this function should > > probably tighten up its scope a bit and only work on stuff that is > > actually following the obj_cgroup protocol. > > Kernel stacks can be slabs or generic pages/vmallocs. Also large kmallocs > are using the page allocator, so they don't follow the objcg protocol. Ah, that's super valuable information! Instead of deleting the "all other pages" comment, could you please elaborate on it? Thanks!