Re: [PATCH 0/1] mm: Remove the SLAB allocator

Jesper Dangaard Brouer <netdev@xxxxxxxxxx> · Wed, 17 Apr 2019 10:50:18 +0200

On Thu, 11 Apr 2019 11:27:26 +0300
Pekka Enberg <penberg@xxxxxx> wrote:

> Hi,
> 
> On 4/11/19 10:55 AM, Michal Hocko wrote:
> > Please please have it more rigorous then what happened when SLUB was
> > forced to become a default  
> 
> This is the hard part.
> 
> Even if you are able to show that SLUB is as fast as SLAB for all the 
> benchmarks you run, there's bound to be that one workload where SLUB 
> regresses. You will then have people complaining about that (rightly so) 
> and you're again stuck with two allocators.
> 
> To move forward, I think we should look at possible *pathological* cases 
> where we think SLAB might have an advantage. For example, SLUB had much 
> more difficulties with remote CPU frees than SLAB. Now I don't know if 
> this is the case, but it should be easy to construct a synthetic 
> benchmark to measure this.

I do think SLUB have a number of pathological cases where SLAB is
faster.  If was significantly more difficult to get good bulk-free
performance for SLUB.  SLUB is only fast as long as objects belong to
the same page.  To get good bulk-free performance if objects are
"mixed", I coded this[1] way-too-complex fast-path code to counter
act this (joined work with Alex Duyck).

[1] https://github.com/torvalds/linux/blob/v5.1-rc5/mm/slub.c#L3033-L3113

> For example, have a userspace process that does networking, which is 
> often memory allocation intensive, so that we know that SKBs traverse 
> between CPUs. You can do this by making sure that the NIC queues are 
> mapped to CPU N (so that network softirqs have to run on that CPU) but 
> the process is pinned to CPU M.

If someone want to test this with SKBs then be-aware that we netdev-guys
have a number of optimizations where we try to counter act this. (As
minimum disable TSO and GRO).

It might also be possible for people to get inspired by and adapt the
micro benchmarking[2] kernel modules that I wrote when developing the
SLUB and SLAB optimizations:

[2] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm

> It's, of course, worth thinking about other pathological cases too. 
> Workloads that cause large allocations is one. Workloads that cause lots 
> of slab cache shrinking is another.

I also worry about long uptimes when SLUB objects/pages gets too
fragmented... as I said SLUB is only efficient when objects are
returned to the same page, while SLAB is not.

I did a comparison of bulk FREE performance here (where SLAB is
slightly faster):
 Commit ca257195511d ("mm: new API kfree_bulk() for SLAB+SLUB allocators")
 [3] https://git.kernel.org/torvalds/c/ca257195511d

You might also notice how simple the SLAB code is:
  Commit e6cdb58d1c83 ("slab: implement bulk free in SLAB allocator")
  [4] https://git.kernel.org/torvalds/c/e6cdb58d1c83

-- 
Best regards,
  Jesper Dangaard Brouer
  MSc.CS, Principal Kernel Engineer at Red Hat
  LinkedIn: http://www.linkedin.com/in/brouer