With the recent introduction of the new slab memory controller, we eliminate the need for having separate kmemcaches for each memory cgroup and reduce overall kernel memory usage. However, we also add additional memory accounting overhead to each call of kmem_cache_alloc() and kmem_cache_free(). For workloads that require a lot of kmemcache allocations and de-allocations, they may experience performance regression as illustrated in [1]. With a simple kernel module that performs repeated loop of 100,000,000 kmem_cache_alloc() and kmem_cache_free() of 64-byte object at module init. The execution time to load the kernel module with and without memory accounting were: with accounting = 6.798s w/o accounting = 1.758s That is an increase of 5.04s (287%). With this patchset applied, the execution time became 4.254s. So the memory accounting overhead is now 2.496s which is a 50% reduction. It was found that a major part of the memory accounting overhead is caused by the local_irq_save()/local_irq_restore() sequences in updating local stock charge bytes and vmstat array, at least in x86 systems. There are two such sequences in kmem_cache_alloc() and two in kmem_cache_free(). This patchset tries to reduce the use of such sequences as much as possible. In fact, it eliminates them in the common case. Another part of this patchset to cache the vmstat data update in the local stock as well which also helps. [1] https://lore.kernel.org/linux-mm/20210408193948.vfktg3azh2wrt56t@gabell/T/#u Waiman Long (5): mm/memcg: Pass both memcg and lruvec to mod_memcg_lruvec_state() mm/memcg: Introduce obj_cgroup_uncharge_mod_state() mm/memcg: Cache vmstat data in percpu memcg_stock_pcp mm/memcg: Separate out object stock data into its own struct mm/memcg: Optimize user context object stock access include/linux/memcontrol.h | 14 ++- mm/memcontrol.c | 198 ++++++++++++++++++++++++++++++++----- mm/percpu.c | 9 +- mm/slab.h | 32 +++--- 4 files changed, 195 insertions(+), 58 deletions(-) -- 2.18.1