Re: [PATCH 0/1] mm: Remove the SLAB allocator

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed 17-04-19 10:50:18, Jesper Dangaard Brouer wrote:
> On Thu, 11 Apr 2019 11:27:26 +0300
> Pekka Enberg <penberg@xxxxxx> wrote:
> 
> > Hi,
> > 
> > On 4/11/19 10:55 AM, Michal Hocko wrote:
> > > Please please have it more rigorous then what happened when SLUB was
> > > forced to become a default  
> > 
> > This is the hard part.
> > 
> > Even if you are able to show that SLUB is as fast as SLAB for all the 
> > benchmarks you run, there's bound to be that one workload where SLUB 
> > regresses. You will then have people complaining about that (rightly so) 
> > and you're again stuck with two allocators.
> > 
> > To move forward, I think we should look at possible *pathological* cases 
> > where we think SLAB might have an advantage. For example, SLUB had much 
> > more difficulties with remote CPU frees than SLAB. Now I don't know if 
> > this is the case, but it should be easy to construct a synthetic 
> > benchmark to measure this.
> 
> I do think SLUB have a number of pathological cases where SLAB is
> faster.  If was significantly more difficult to get good bulk-free
> performance for SLUB.  SLUB is only fast as long as objects belong to
> the same page.  To get good bulk-free performance if objects are
> "mixed", I coded this[1] way-too-complex fast-path code to counter
> act this (joined work with Alex Duyck).
> 
> [1] https://github.com/torvalds/linux/blob/v5.1-rc5/mm/slub.c#L3033-L3113

How often is this a real problem for real workloads?

> > For example, have a userspace process that does networking, which is 
> > often memory allocation intensive, so that we know that SKBs traverse 
> > between CPUs. You can do this by making sure that the NIC queues are 
> > mapped to CPU N (so that network softirqs have to run on that CPU) but 
> > the process is pinned to CPU M.
> 
> If someone want to test this with SKBs then be-aware that we netdev-guys
> have a number of optimizations where we try to counter act this. (As
> minimum disable TSO and GRO).
> 
> It might also be possible for people to get inspired by and adapt the
> micro benchmarking[2] kernel modules that I wrote when developing the
> SLUB and SLAB optimizations:
> 
> [2] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm

While microbenchmarks are good to see pathological behavior, I would be
really interested to see some numbers for real world usecases.
 
> > It's, of course, worth thinking about other pathological cases too. 
> > Workloads that cause large allocations is one. Workloads that cause lots 
> > of slab cache shrinking is another.
> 
> I also worry about long uptimes when SLUB objects/pages gets too
> fragmented... as I said SLUB is only efficient when objects are
> returned to the same page, while SLAB is not.

Is this something that has been actually measured in a real deployment?
-- 
Michal Hocko
SUSE Labs




[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux