On 2023-06-22 10:33:13 [-0400], Mathieu Desnoyers wrote: > > What was fundamentally wrong with the per-cpu caches before commit > df323337e50 other than being non-RT friendly ? Was the only purpose of that > commit to reduce the duration of preempt-off critical sections, or is there > a bigger picture concern it was taking care of by introducing a global pool > ? There were memory allocations within preempt-disabled sections introduced by get_cpu_ptr(). This didn't fly on PREEMPT_RT. After looking at this on 2 node, 64 CPUs box I didn't get anywhere near complete usage of the allocated buffers per-CPU buffers. It looked wasteful. Based on my testing back then, it looked sufficient to use a global buffer. > Introducing per-cpu memory pools, dealing with migration by giving entries > back to the right cpu's pool, taking into account the cpu the entry belongs > to, and use a per-cpu/lock-free data structure allowing lock-free push to > give back an entry on a remote cpu should do the trick without locking, and > without long preempt-off critical sections. > > The only downside I see for per-cpu memory pools is a slightly larger memory > overhead on large multi-core systems. But is that really a concern ? Yes, if the memory is left unused and can't be reclaimed if needed. > What am I missing here ? I added (tried to add) an people claimed that the SLUB allocated got better over the years. Also on NUMA systems it might be better to use it since the memory is NUMA local. The allocation is for 8KiB by default. Is there a test-case/ benchmark I could try this vs kmalloc()? > Thanks, > > Mathieu Sebastian