On Mon, May 01, 2023 at 11:14:45AM -0700, Roman Gushchin wrote: > It's a good idea and I generally think that +25-35% for kmalloc/pgalloc > should be ok for the production use, which is great! > In the reality, most workloads are not that sensitive to the speed of > memory allocation. :) My main takeaway has been "the slub fast path is _really_ fast". No disabling of preemption, no atomic instructions, just a non locked double word cmpxchg - it's a slick piece of work. > > For kmalloc, the overhead is low because after we create the vector of > > slab_ext objects (which is the same as what memcg_kmem does), memory > > profiling just increments a lazy counter (which in many cases would be > > a per-cpu counter). > > So does kmem (this is why I'm somewhat surprised by the difference). > > > memcg_kmem operates on cgroup hierarchy with > > additional overhead associated with that. I'm guessing that's the > > reason for the big difference between these mechanisms but, I didn't > > look into the details to understand memcg_kmem performance. > > I suspect recent rt-related changes and also the wide usage of > rcu primitives in the kmem code. I'll try to look closer as well. Happy to give you something to compare against :)