On Fri, Apr 09, 2021 at 07:18:37PM -0400, Waiman Long wrote: > With the recent introduction of the new slab memory controller, we > eliminate the need for having separate kmemcaches for each memory > cgroup and reduce overall kernel memory usage. However, we also add > additional memory accounting overhead to each call of kmem_cache_alloc() > and kmem_cache_free(). > > For workloads that require a lot of kmemcache allocations and > de-allocations, they may experience performance regression as illustrated > in [1]. > > With a simple kernel module that performs repeated loop of 100,000,000 > kmem_cache_alloc() and kmem_cache_free() of 64-byte object at module > init. The execution time to load the kernel module with and without > memory accounting were: > > with accounting = 6.798s > w/o accounting = 1.758s > > That is an increase of 5.04s (287%). With this patchset applied, the > execution time became 4.254s. So the memory accounting overhead is now > 2.496s which is a 50% reduction. Hi Waiman! Thank you for working on it, it's indeed very useful! A couple of questions: 1) did your config included lockdep or not? 2) do you have a (rough) estimation how much each change contributes to the overall reduction? Thanks! > > It was found that a major part of the memory accounting overhead > is caused by the local_irq_save()/local_irq_restore() sequences in > updating local stock charge bytes and vmstat array, at least in x86 > systems. There are two such sequences in kmem_cache_alloc() and two > in kmem_cache_free(). This patchset tries to reduce the use of such > sequences as much as possible. In fact, it eliminates them in the common > case. Another part of this patchset to cache the vmstat data update in > the local stock as well which also helps. > > [1] https://lore.kernel.org/linux-mm/20210408193948.vfktg3azh2wrt56t@gabell/T/#u > > Waiman Long (5): > mm/memcg: Pass both memcg and lruvec to mod_memcg_lruvec_state() > mm/memcg: Introduce obj_cgroup_uncharge_mod_state() > mm/memcg: Cache vmstat data in percpu memcg_stock_pcp > mm/memcg: Separate out object stock data into its own struct > mm/memcg: Optimize user context object stock access > > include/linux/memcontrol.h | 14 ++- > mm/memcontrol.c | 198 ++++++++++++++++++++++++++++++++----- > mm/percpu.c | 9 +- > mm/slab.h | 32 +++--- > 4 files changed, 195 insertions(+), 58 deletions(-) > > -- > 2.18.1 >