On Sat, Mar 15, 2025 at 10:49:29AM -0700, Shakeel Butt wrote: > From: Vlastimil Babka <vbabka@xxxxxxx> > > When handing slab objects, we use obj_cgroup_[un]charge() for > (un)charging and mod_objcg_state() to account NR_SLAB_[UN]RECLAIMABLE_B. > All these operations use the percpu stock for performance. However with > the calls being separate, the stock_lock is taken twice in each case. > > By refactoring the code, we can turn mod_objcg_state() into > __account_obj_stock() which is called on a stock that's already locked > and validated. On the charging side we can call this function from > consume_obj_stock() when it succeeds, and refill_obj_stock() in the > fallback. We just expand parameters of these functions as necessary. > The uncharge side from __memcg_slab_free_hook() is just the call to > refill_obj_stock(). > > Other callers of obj_cgroup_[un]charge() (i.e. not slab) simply pass the > extra parameters as NULL/zeroes to skip the __account_obj_stock() > operation. > > In __memcg_slab_post_alloc_hook() we now charge each object separately, > but that's not a problem as we did call mod_objcg_state() for each > object separately, and most allocations are non-bulk anyway. This > could be improved by batching all operations until slab_pgdat(slab) > changes. > > Some preliminary benchmarking with a kfree(kmalloc()) loop of 10M > iterations with/without __GFP_ACCOUNT: > > Before the patch: > kmalloc/kfree !memcg: 581390144 cycles > kmalloc/kfree memcg: 783689984 cycles > > After the patch: > kmalloc/kfree memcg: 658723808 cycles > > More than half of the overhead of __GFP_ACCOUNT relative to > non-accounted case seems eliminated. Oh, this is huge! I believe the next step is to also integrate the refcnt management, it might shave off few more percent. Reviewed-by: Roman Gushchin <roman.gushchin@xxxxxxxxx>