On Tue, Apr 13, 2021 at 09:20:25PM -0400, Waiman Long wrote: > Before the new slab memory controller with per object byte charging, > charging and vmstat data update happen only when new slab pages are > allocated or freed. Now they are done with every kmem_cache_alloc() > and kmem_cache_free(). This causes additional overhead for workloads > that generate a lot of alloc and free calls. > > The memcg_stock_pcp is used to cache byte charge for a specific > obj_cgroup to reduce that overhead. To further reducing it, this patch > makes the vmstat data cached in the memcg_stock_pcp structure as well > until it accumulates a page size worth of update or when other cached > data change. > > On a 2-socket Cascade Lake server with instrumentation enabled and this > patch applied, it was found that about 17% (946796 out of 5515184) of the > time when __mod_obj_stock_state() is called leads to an actual call to > mod_objcg_state() after initial boot. When doing parallel kernel build, > the figure was about 16% (21894614 out of 139780628). So caching the > vmstat data reduces the number of calls to mod_objcg_state() by more > than 80%. Right, but mod_objcg_state() is itself already percpu-cached. What's the benefit of avoiding calls to it with another percpu cache?