On Mon, Sep 16, 2019 at 02:56:11PM +0200, Johannes Weiner wrote: > On Thu, Sep 05, 2019 at 02:45:45PM -0700, Roman Gushchin wrote: > > Introduce an API to charge subpage objects to the memory cgroup. > > The API will be used by the new slab memory controller. Later it > > can also be used to implement percpu memory accounting. > > In both cases, a single page can be shared between multiple cgroups > > (and in percpu case a single allocation is split over multiple pages), > > so it's not possible to use page-based accounting. > > > > The implementation is based on percpu stocks. Memory cgroups are still > > charged in pages, and the residue is stored in perpcu stock, or on the > > memcg itself, when it's necessary to flush the stock. > > Did you just implement a slab allocator for page_counter to track > memory consumed by the slab allocator? :) > > > @@ -2500,8 +2577,9 @@ void mem_cgroup_handle_over_high(void) > > } > > > > static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > - unsigned int nr_pages) > > + unsigned int amount, bool subpage) > > { > > + unsigned int nr_pages = subpage ? ((amount >> PAGE_SHIFT) + 1) : amount; > > unsigned int batch = max(MEMCG_CHARGE_BATCH, nr_pages); > > int nr_retries = MEM_CGROUP_RECLAIM_RETRIES; > > struct mem_cgroup *mem_over_limit; > > @@ -2514,7 +2592,9 @@ static int try_charge(struct mem_cgroup *memcg, gfp_t gfp_mask, > > if (mem_cgroup_is_root(memcg)) > > return 0; > > retry: > > - if (consume_stock(memcg, nr_pages)) > > + if (subpage && consume_subpage_stock(memcg, amount)) > > + return 0; > > + else if (!subpage && consume_stock(memcg, nr_pages)) > > return 0; > > The layering here isn't clean. We have an existing per-cpu cache to > batch-charge the page counter. Why does the new subpage allocator not > sit on *top* of this, instead of wedged in between? > > I think what it should be is a try_charge_bytes() that simply gets one > page from try_charge() and then does its byte tracking, regardless of > how try_charge() chooses to implement its own page tracking. > > That would avoid the awkward @amount + @subpage multiplexing, as well > as annotating all existing callsites of try_charge() with a > non-descript "false" parameter. > > You can still reuse the stock data structures, use the lower bits of > stock->nr_bytes for a different cgroup etc., but the charge API should > really be separate. Hm, I kinda like the idea, however there is a complication: for the subpage accounting the css reference management is done in a different way, so that all existing code should avoid changing the css refcounter. So I'd need to pass a boolean argument anyway. But let me try to write this down, hopefully v2 will be cleaner. Thank you!