On Wed 17-04-19 10:50:18, Jesper Dangaard Brouer wrote: > On Thu, 11 Apr 2019 11:27:26 +0300 > Pekka Enberg <penberg@xxxxxx> wrote: > > > Hi, > > > > On 4/11/19 10:55 AM, Michal Hocko wrote: > > > Please please have it more rigorous then what happened when SLUB was > > > forced to become a default > > > > This is the hard part. > > > > Even if you are able to show that SLUB is as fast as SLAB for all the > > benchmarks you run, there's bound to be that one workload where SLUB > > regresses. You will then have people complaining about that (rightly so) > > and you're again stuck with two allocators. > > > > To move forward, I think we should look at possible *pathological* cases > > where we think SLAB might have an advantage. For example, SLUB had much > > more difficulties with remote CPU frees than SLAB. Now I don't know if > > this is the case, but it should be easy to construct a synthetic > > benchmark to measure this. > > I do think SLUB have a number of pathological cases where SLAB is > faster. If was significantly more difficult to get good bulk-free > performance for SLUB. SLUB is only fast as long as objects belong to > the same page. To get good bulk-free performance if objects are > "mixed", I coded this[1] way-too-complex fast-path code to counter > act this (joined work with Alex Duyck). > > [1] https://github.com/torvalds/linux/blob/v5.1-rc5/mm/slub.c#L3033-L3113 How often is this a real problem for real workloads? > > For example, have a userspace process that does networking, which is > > often memory allocation intensive, so that we know that SKBs traverse > > between CPUs. You can do this by making sure that the NIC queues are > > mapped to CPU N (so that network softirqs have to run on that CPU) but > > the process is pinned to CPU M. > > If someone want to test this with SKBs then be-aware that we netdev-guys > have a number of optimizations where we try to counter act this. (As > minimum disable TSO and GRO). > > It might also be possible for people to get inspired by and adapt the > micro benchmarking[2] kernel modules that I wrote when developing the > SLUB and SLAB optimizations: > > [2] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm While microbenchmarks are good to see pathological behavior, I would be really interested to see some numbers for real world usecases. > > It's, of course, worth thinking about other pathological cases too. > > Workloads that cause large allocations is one. Workloads that cause lots > > of slab cache shrinking is another. > > I also worry about long uptimes when SLUB objects/pages gets too > fragmented... as I said SLUB is only efficient when objects are > returned to the same page, while SLAB is not. Is this something that has been actually measured in a real deployment? -- Michal Hocko SUSE Labs