I'm still running tests to explore some of these questions. The machines I am using are roughly as follows. Intel dual socket 56 total cores 192-384GB ram LEVEL1_ICACHE_SIZE 32768 LEVEL1_DCACHE_SIZE 32768 LEVEL2_CACHE_SIZE 1048576 LEVEL3_CACHE_SIZE 40370176 Amd dual socket 128 total cores 1TB ram LEVEL1_ICACHE_SIZE 32768 LEVEL1_DCACHE_SIZE 32768 LEVEL2_CACHE_SIZE 524288 LEVEL3_CACHE_SIZE 268435456 Arm single socket 64 total cores 256GB rma LEVEL1_ICACHE_SIZE 65536 LEVEL1_DCACHE_SIZE 65536 LEVEL2_CACHE_SIZE 1048576 LEVEL3_CACHE_SIZE 33554432 On Tue, Apr 4, 2023 at 12:03 PM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > On 3/22/23 13:30, Binder Makin wrote: > > Was looking at SLAB removal and started by running A/B tests of SLAB > > vs SLUB. Please note these are only preliminary results. > > Thanks, that's very useful. > > > These were run using 6.1.13 configured for SLAB/SLUB. > > Machines were standard datacenter servers. > > > > Hackbench shows completion time, so smaller is better. > > On all others larger is better. > > https://docs.google.com/spreadsheets/d/e/2PACX-1vQ47Mekl8BOp3ekCefwL6wL8SQiv6Qvp5avkU2ssQSh41gntjivE-aKM4PkwzkC4N_s_MxUdcsokhhz/pubhtml > > > > Some notes: > > SUnreclaim and SReclaimable shows unreclaimable and reclaimable memory. > > Substantially higher with SLUB, but I believe that is to be expected. > > > > Various results showing a 5-10% degradation with SLUB. That feels > > concerning to me, but I'm not sure what others' tolerance would be. > > > > redis results on AMD show some pretty bad degredations. 10-20% range > > netpipe on Intel also has issues.. 10-17% > > I guess one question is which ones are genuine SLAB/SLUB differences and not > e.g. some artifact of different cache layout or something. For example it > seems suspicious if results are widely different between architectures. > > E.g. will-it-scale writeseek3_scalability regresses on arm64 and amd, but > improves on intel? Or is something wrong with the data, all columns for that > whole benchmark suite are identical. > > hackbench ("smaller is better") seems drastically better on arm64 (30% > median time reduction?) and amd (80% reduction?!?), but 10% slower intel? > > redis seems a bit improved on arm64, slightly worse on intel but much worse > on amd. > > specjbb similar story, also I thought it was a java focused benchmark, > should it really be exercising kernel slab allocators in such notable way? > > I guess netpipe is the least surprising as networking was always mentioned > in SLAB vs SLUB discussions. > > > On Tue, Mar 14, 2023 at 4:05 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > >> > >> As you're probably aware, my plan is to get rid of SLOB and SLAB, leaving > >> only SLUB going forward. The removal of SLOB seems to be going well, there > >> were no objections to the deprecation and I've posted v1 of the removal > >> itself [1] so it could be in -next soon. > >> > >> The immediate benefit of that is that we can allow kfree() (and kfree_rcu()) > >> to free objects from kmem_cache_alloc() - something that IIRC at least xfs > >> people wanted in the past, and SLOB was incompatible with that. > >> > >> For SLAB removal I haven't yet heard any objections (but also didn't > >> deprecate it yet) but if there are any users due to particular workloads > >> doing better with SLAB than SLUB, we can discuss why those would regress and > >> what can be done about that in SLUB. > >> > >> Once we have just one slab allocator in the kernel, we can take a closer > >> look at what the users are missing from it that forces them to create own > >> allocators (e.g. BPF), and could be considered to be added as a generic > >> implementation to SLUB. > >> > >> Thanks, > >> Vlastimil > >> > >> [1] https://lore.kernel.org/all/20230310103210.22372-1-vbabka@xxxxxxx/ > >> >