On Thu, 11 Jan 2024 at 00:01, Andi Kleen <ak@xxxxxxxxxxxxxxx> wrote: > > Oscar Salvador <osalvador@xxxxxxx> writes: > >> > >> With this change, multiple users can still look up records in parallel. > > That's a severe misunderstanding -- rwlocks always bounce a cache line, > so the parallelism is significantly reduced. > > Normally rwlocks are only worth it if your critical region is quite long. > > >> > >> This is preparatory patch for implementing the eviction of stack records > >> from the stack depot. > >> > >> Reviewed-by: Alexander Potapenko <glider@xxxxxxxxxx> > >> Signed-off-by: Andrey Konovalov <andreyknvl@xxxxxxxxxx> > > > > Reviewed-by: Oscar Salvador <osalvador@xxxxxxx> > > > Has anyone benchmarked this on a high core count machine? It sounds > pretty bad if every lock aquisition starts bouncing a single cache line. > > Consider using RCU or similar. stackdepot is severely limited in what kernel facilities it may use due to being used by such low level facilities as the allocator itself. I've been suggesting percpu-rwsem here, but looking at it in more detail that doesn't work because percpu-rwsem wants to sleep, but stackdepot must work in non-sleepable contexts. :-/