Re: [LSF/MM/BPF TOPIC] SLOB+SLAB allocators removal and future SLUB improvements

Vlastimil Babka <vbabka@xxxxxxx> · Tue, 4 Apr 2023 18:03:07 +0200

On 3/22/23 13:30, Binder Makin wrote:
> Was looking at SLAB removal and started by running A/B tests of SLAB
> vs SLUB.  Please note these are only preliminary results.

Thanks, that's very useful.

> These were run using 6.1.13 configured for SLAB/SLUB.
> Machines were standard datacenter servers.
> 
> Hackbench shows completion time, so smaller is better.
> On all others larger is better.
> https://docs.google.com/spreadsheets/d/e/2PACX-1vQ47Mekl8BOp3ekCefwL6wL8SQiv6Qvp5avkU2ssQSh41gntjivE-aKM4PkwzkC4N_s_MxUdcsokhhz/pubhtml
> 
> Some notes:
> SUnreclaim and SReclaimable shows unreclaimable and reclaimable memory.
> Substantially higher with SLUB, but I believe that is to be expected.
> 
> Various results showing a 5-10% degradation with SLUB.  That feels
> concerning to me, but I'm not sure what others' tolerance would be.
> 
> redis results on AMD show some pretty bad degredations.  10-20% range
> netpipe on Intel also has issues.. 10-17%

I guess one question is which ones are genuine SLAB/SLUB differences and not
e.g. some artifact of different cache layout or something. For example it
seems suspicious if results are widely different between architectures.

E.g. will-it-scale writeseek3_scalability regresses on arm64 and amd, but
improves on intel? Or is something wrong with the data, all columns for that
whole benchmark suite are identical.

hackbench ("smaller is better") seems drastically better on arm64 (30%
median time reduction?) and amd (80% reduction?!?), but 10% slower intel?

redis seems a bit improved on arm64, slightly worse on intel but much worse
on amd.

specjbb similar story, also I thought it was a java focused benchmark,
should it really be exercising kernel slab allocators in such notable way?

I guess netpipe is the least surprising as networking was always mentioned
in SLAB vs SLUB discussions.

> On Tue, Mar 14, 2023 at 4:05 AM Vlastimil Babka <vbabka@xxxxxxx> wrote:
>>
>> As you're probably aware, my plan is to get rid of SLOB and SLAB, leaving
>> only SLUB going forward. The removal of SLOB seems to be going well, there
>> were no objections to the deprecation and I've posted v1 of the removal
>> itself [1] so it could be in -next soon.
>>
>> The immediate benefit of that is that we can allow kfree() (and kfree_rcu())
>> to free objects from kmem_cache_alloc() - something that IIRC at least xfs
>> people wanted in the past, and SLOB was incompatible with that.
>>
>> For SLAB removal I haven't yet heard any objections (but also didn't
>> deprecate it yet) but if there are any users due to particular workloads
>> doing better with SLAB than SLUB, we can discuss why those would regress and
>> what can be done about that in SLUB.
>>
>> Once we have just one slab allocator in the kernel, we can take a closer
>> look at what the users are missing from it that forces them to create own
>> allocators (e.g. BPF), and could be considered to be added as a generic
>> implementation to SLUB.
>>
>> Thanks,
>> Vlastimil
>>
>> [1] https://lore.kernel.org/all/20230310103210.22372-1-vbabka@xxxxxxx/
>>