On Wed, Jun 17, 2020 at 03:31:10PM +0100, Mel Gorman wrote: > On Wed, Jun 17, 2020 at 01:24:21PM +0200, Vlastimil Babka wrote: > > > Not really. > > > > > > Sharing a single set of caches adds some overhead to root- and non-accounted > > > allocations, which is something I've tried hard to avoid in my original version. > > > But I have to admit, it allows to simplify and remove a lot of code, and here > > > it's hard to argue with Johanness, who pushed on this design. > > > > > > With performance testing it's not that easy, because it's not obvious what > > > we wanna test. Obviously, per-object accounting is more expensive, and > > > measuring something like 1000000 allocations and deallocations in a line from > > > a single kmem_cache will show a regression. But in the real world the relative > > > cost of allocations is usually low, and we can get some benefits from a smaller > > > working set and from having shared kmem_cache objects cache hot. > > > Not speaking about some extra memory and the fragmentation reduction. > > > > > > We've done an extensive testing of the original version in Facebook production, > > > and we haven't noticed any regressions so far. But I have to admit, we were > > > using an original version with two sets of kmem_caches. > > > > > > If you have any specific tests in mind, I can definitely run them. Or if you > > > can help with the performance evaluation, I'll appreciate it a lot. > > > > Jesper provided some pointers here [1], it would be really great if you could > > run at least those microbenchmarks. With mmtests it's the major question of > > which subset/profiles to run, maybe the referenced commits provide some hints, > > or maybe Mel could suggest what he used to evaluate SLAB vs SLUB not so long ago. > > > > Last time the list of mmtests configurations I used for a basic > comparison were > > db-pgbench-timed-ro-small-ext4 > db-pgbench-timed-ro-small-xfs > io-dbench4-async-ext4 > io-dbench4-async-xfs > io-bonnie-dir-async-ext4 > io-bonnie-dir-async-xfs > io-bonnie-file-async-ext4 > io-bonnie-file-async-xfs > io-fsmark-xfsrepair-xfs > io-metadata-xfs > network-netperf-unbound > network-netperf-cross-node > network-netperf-cross-socket > network-sockperf-unbound > network-netperf-unix-unbound > network-netpipe > network-tbench > pagereclaim-shrinker-ext4 > scheduler-unbound > scheduler-forkintensive > workload-kerndevel-xfs > workload-thpscale-madvhugepage-xfs > workload-thpscale-xfs > > Some were more valid than others in terms of doing an evaluation. I > followed up later with a more comprehensive comparison but that was > overkill. > > Each time I did a slab/slub comparison in the past, I had to reverify > the rate that kmem_cache_* functions were actually being called as the > pattern can change over time even for the same workload. A comparison > gets more complicated when comparing cgroups as ideally there would be > workloads running in multiple group but that gets complex and I think > it's reasonable to just test the "basic" case without cgroups. Thank you Mel for the suggestion! I'll try to come up with some numbers soon. I guess networking tests will be most interesting in this case. Thanks! Roman