On 2/28/2024 4:38 AM, Eric Dumazet wrote: >> >> sk_prot->memory_allocated points to global atomic variable: >> atomic_long_t tcp_memory_allocated ____cacheline_aligned_in_smp; >> >> If increasing the per-cpu cache size from 1MB to e.g. 16MB, >> changes to sk->sk_prot->memory_allocated can be further reduced. >> Performance may be improved on system with many cores. > > This looks good, do you have any performance numbers to share ? I ran localhost memcached test on system with 320 CPU threads, perf shows 4% cycles spent in __sk_mem_raise_allocated() -->sk_memory_allocated(). If increasing SK_MEMORY_PCPU_RESERV to 16MB, perf cycles spent in __sk_mem_raise_allocated() drops to 0.4%. Thanks, -adam