Re: [PATCH 00/40] Memory allocation profiling

Kent Overstreet <kent.overstreet@xxxxxxxxx> · Mon, 1 May 2023 15:37:58 -0400

On Mon, May 01, 2023 at 11:14:45AM -0700, Roman Gushchin wrote:
> It's a good idea and I generally think that +25-35% for kmalloc/pgalloc
> should be ok for the production use, which is great!
> In the reality, most workloads are not that sensitive to the speed of
> memory allocation.

:)

My main takeaway has been "the slub fast path is _really_ fast". No
disabling of preemption, no atomic instructions, just a non locked
double word cmpxchg - it's a slick piece of work.

> > For kmalloc, the overhead is low because after we create the vector of
> > slab_ext objects (which is the same as what memcg_kmem does), memory
> > profiling just increments a lazy counter (which in many cases would be
> > a per-cpu counter).
> 
> So does kmem (this is why I'm somewhat surprised by the difference).
> 
> > memcg_kmem operates on cgroup hierarchy with
> > additional overhead associated with that. I'm guessing that's the
> > reason for the big difference between these mechanisms but, I didn't
> > look into the details to understand memcg_kmem performance.
> 
> I suspect recent rt-related changes and also the wide usage of
> rcu primitives in the kmem code. I'll try to look closer as well.

Happy to give you something to compare against :)