Re: [LSF/MM/BPF TOPIC] SLOB+SLAB allocators removal and future SLUB improvements

Binder Makin <merimus@xxxxxxxxxx> · Wed, 5 Apr 2023 15:54:45 -0400

I'm still running tests to explore some of these questions.
The machines I am using are roughly as follows.

Intel dual socket 56 total cores
192-384GB ram
LEVEL1_ICACHE_SIZE                 32768
LEVEL1_DCACHE_SIZE                 32768
LEVEL2_CACHE_SIZE                  1048576
LEVEL3_CACHE_SIZE                  40370176

Amd dual socket 128 total cores
1TB ram
LEVEL1_ICACHE_SIZE                 32768
LEVEL1_DCACHE_SIZE                 32768
LEVEL2_CACHE_SIZE                  524288
LEVEL3_CACHE_SIZE                  268435456

Arm single socket 64 total cores
256GB rma
LEVEL1_ICACHE_SIZE                 65536
LEVEL1_DCACHE_SIZE                 65536
LEVEL2_CACHE_SIZE                  1048576
LEVEL3_CACHE_SIZE                  33554432

On Tue, Apr 4, 2023 at 12:03 PM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>
> On 3/22/23 13:30, Binder Makin wrote:
> > Was looking at SLAB removal and started by running A/B tests of SLAB
> > vs SLUB.  Please note these are only preliminary results.
>
> Thanks, that's very useful.
>
> > These were run using 6.1.13 configured for SLAB/SLUB.
> > Machines were standard datacenter servers.
> >
> > Hackbench shows completion time, so smaller is better.
> > On all others larger is better.
> > https://docs.google.com/spreadsheets/d/e/2PACX-1vQ47Mekl8BOp3ekCefwL6wL8SQiv6Qvp5avkU2ssQSh41gntjivE-aKM4PkwzkC4N_s_MxUdcsokhhz/pubhtml
> >
> > Some notes:
> > SUnreclaim and SReclaimable shows unreclaimable and reclaimable memory.
> > Substantially higher with SLUB, but I believe that is to be expected.
> >
> > Various results showing a 5-10% degradation with SLUB.  That feels
> > concerning to me, but I'm not sure what others' tolerance would be.
> >
> > redis results on AMD show some pretty bad degredations.  10-20% range
> > netpipe on Intel also has issues.. 10-17%
>
> I guess one question is which ones are genuine SLAB/SLUB differences and not
> e.g. some artifact of different cache layout or something. For example it
> seems suspicious if results are widely different between architectures.
>
> E.g. will-it-scale writeseek3_scalability regresses on arm64 and amd, but
> improves on intel? Or is something wrong with the data, all columns for that
> whole benchmark suite are identical.
>
> hackbench ("smaller is better") seems drastically better on arm64 (30%
> median time reduction?) and amd (80% reduction?!?), but 10% slower intel?
>
> redis seems a bit improved on arm64, slightly worse on intel but much worse
> on amd.
>
> specjbb similar story, also I thought it was a java focused benchmark,
> should it really be exercising kernel slab allocators in such notable way?
>
> I guess netpipe is the least surprising as networking was always mentioned
> in SLAB vs SLUB discussions.
>
> > On Tue, Mar 14, 2023 at 4:05 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
> >>
> >> As you're probably aware, my plan is to get rid of SLOB and SLAB, leaving
> >> only SLUB going forward. The removal of SLOB seems to be going well, there
> >> were no objections to the deprecation and I've posted v1 of the removal
> >> itself [1] so it could be in -next soon.
> >>
> >> The immediate benefit of that is that we can allow kfree() (and kfree_rcu())
> >> to free objects from kmem_cache_alloc() - something that IIRC at least xfs
> >> people wanted in the past, and SLOB was incompatible with that.
> >>
> >> For SLAB removal I haven't yet heard any objections (but also didn't
> >> deprecate it yet) but if there are any users due to particular workloads
> >> doing better with SLAB than SLUB, we can discuss why those would regress and
> >> what can be done about that in SLUB.
> >>
> >> Once we have just one slab allocator in the kernel, we can take a closer
> >> look at what the users are missing from it that forces them to create own
> >> allocators (e.g. BPF), and could be considered to be added as a generic
> >> implementation to SLUB.
> >>
> >> Thanks,
> >> Vlastimil
> >>
> >> [1] https://lore.kernel.org/all/20230310103210.22372-1-vbabka@xxxxxxx/
> >>
>