On Thu, May 07, 2020 at 03:26:31PM -0700, Roman Gushchin wrote: > On Thu, May 07, 2020 at 05:03:14PM -0400, Johannes Weiner wrote: > > On Wed, Apr 22, 2020 at 01:46:55PM -0700, Roman Gushchin wrote: > > > --- a/mm/memcontrol.c > > > +++ b/mm/memcontrol.c > > > @@ -257,6 +257,78 @@ struct cgroup_subsys_state *vmpressure_to_css(struct vmpressure *vmpr) > > > } > > > > > > #ifdef CONFIG_MEMCG_KMEM > > > +extern spinlock_t css_set_lock; > > > + > > > +static void obj_cgroup_release(struct percpu_ref *ref) > > > +{ > > > + struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt); > > > + struct mem_cgroup *memcg; > > > + unsigned int nr_bytes; > > > + unsigned int nr_pages; > > > + unsigned long flags; > > > + > > > + nr_bytes = atomic_read(&objcg->nr_charged_bytes); > > > + WARN_ON_ONCE(nr_bytes & (PAGE_SIZE - 1)); > > > + nr_pages = nr_bytes >> PAGE_SHIFT; > > > > What guarantees that we don't have a partial page in there at this > > point? I guess any outstanding allocations would pin the objcg, so > > when it's released all objects have been freed. > > Right, this is exactly the reason why there can't be a partial page > at this point. > > > > > But if that's true, how can we have full pages remaining in there now? > > Imagine the following sequence: > 1) CPU0: objcg == stock->cached_objcg > 2) CPU1: we do a small allocation (e.g. 92 bytes), page is charged > 3) CPU1: a process from another memcg is allocating something, stock if flushed, > objcg->nr_charged_bytes = PAGE_SIZE - 92 > 5) CPU0: we do release this object, 92 bytes are added to stock->nr_bytes > 6) CPU0: stock is flushed, 92 bytes are added to objcg->nr_charged_bytes > > In the result, nr_charged_bytes == PAGE_SIZE. This PAGE will be uncharged > in obj_cgroup_release(). > > I've double checked this, it's actually pretty easy to trigger in the real life. Ah, so no outstanding allocations, but a full page split between the percpu cache and objcg->nr_charged_bytes. Would it simplify things if refill_obj_stock() drained on >= PAGE_SIZE stock instead of > PAGE_SIZE? Otherwise, the scenario above would be good to have as a comment as the drain on release is not self-explanatory. > > > +int obj_cgroup_charge(struct obj_cgroup *objcg, gfp_t gfp, size_t size) > > > +{ > > > + struct mem_cgroup *memcg; > > > + unsigned int nr_pages, nr_bytes; > > > + int ret; > > > + > > > + if (consume_obj_stock(objcg, size)) > > > + return 0; > > > + > > > + rcu_read_lock(); > > > + memcg = obj_cgroup_memcg(objcg); > > > + css_get(&memcg->css); > > > + rcu_read_unlock(); > > > + > > > + nr_pages = size >> PAGE_SHIFT; > > > + nr_bytes = size & (PAGE_SIZE - 1); > > > + > > > + if (nr_bytes) > > > + nr_pages += 1; > > > + > > > + ret = __memcg_kmem_charge(memcg, gfp, nr_pages); > > > > If consume_obj_stock() fails because some other memcg is cached, > > should this try to consume the partial page in objcg->nr_charged_bytes > > before getting more pages? > > We can definitely do it, but I'm not sure if it's good for the performance. > > Dealing with nr_charged_bytes will require up to two atomic writes, > so calling __memcg_kmem_charge() can be faster if memcg is cached > on percpu stock. Hm, but it's the slowpath. And sooner or later somebody has to deal with the remaining memory in there. > > > + if (!ret && nr_bytes) > > > + refill_obj_stock(objcg, PAGE_SIZE - nr_bytes); > > > > This will put the cgroup into the cache if the allocation resulted in > > a partially free page. > > > > But if this was a page allocation, we may have objcg->nr_cache_bytes > > from a previous subpage allocation that we should probably put back > > into the stock. > > Yeah, we can do this, but I don't know if there will be any benefits. It's mostly about understanding the code. > Actually we don't wanna to touch objcg->nr_cache_bytes too often, as > it can become a contention point if there are many threads allocating > in the memory cgroup. > > So maybe we want to do the opposite: relax it a bit and stop flushing > it on every stock refill and flush only if it exceeds a certain value. That could be useful, yes. > > It's not a common case, I'm just trying to understand what > > objcg->nr_cache_bytes holds and when it does so. > > So it's actually a centralized leftover from the rounding of the actual > charge to the page size. It would be good to add code comments explaining this. Thanks!