On Tue, 26 May 2020 16:55:05 +0200 Vlastimil Babka <vbabka@xxxxxxx> wrote: > On 4/22/20 10:47 PM, Roman Gushchin wrote: > > Instead of having two sets of kmem_caches: one for system-wide and > > non-accounted allocations and the second one shared by all accounted > > allocations, we can use just one. > > > > The idea is simple: space for obj_cgroup metadata can be allocated > > on demand and filled only for accounted allocations. > > > > It allows to remove a bunch of code which is required to handle > > kmem_cache clones for accounted allocations. There is no more need > > to create them, accumulate statistics, propagate attributes, etc. > > It's a quite significant simplification. > > > > Also, because the total number of slab_caches is reduced almost twice > > (not all kmem_caches have a memcg clone), some additional memory > > savings are expected. On my devvm it additionally saves about 3.5% > > of slab memory. > > > > Suggested-by: Johannes Weiner <hannes@xxxxxxxxxxx> > > Signed-off-by: Roman Gushchin <guro@xxxxxx> > > Reviewed-by: Vlastimil Babka <vbabka@xxxxxxx> > > However, as this series will affect slab fastpaths, and perhaps > especially this patch will affect even non-kmemcg allocations being > freed, I'm CCing Jesper and Mel for awareness as they AFAIK did work > on network stack memory management performance, and perhaps some > benchmarks are in order... Thanks for the heads-up! We (should) all know Mel Gorman's tests, which is here[1]: [1] https://github.com/gormanm/mmtests My guess is that these change will only be visible with micro benchmarks of the slub/slab. I my slab/slub micro benchmarks are located here [2] https://github.com/netoptimizer/prototype-kernel/ It is kernel modules that is compiled against your devel tree and pushed to the remote host. Results are simply printk'ed in dmesg. Usage compile+push commands documented here[3]: [3] https://prototype-kernel.readthedocs.io/en/latest/prototype-kernel/build-process.html I recommend trying: "slab_bulk_test01" modprobe slab_bulk_test01; rmmod slab_bulk_test01 dmesg Result from these kernel module benchmarks are included in some commits[4][5]. And in [4] I found some overhead caused by MEMCG. [4] https://git.kernel.org/torvalds/c/ca257195511d [5] https://git.kernel.org/torvalds/c/fbd02630c6e3 -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer