On Fri, Oct 18, 2019 at 01:03:54PM -0400, Waiman Long wrote: > On 10/17/19 8:28 PM, Roman Gushchin wrote: > > The existing slab memory controller is based on the idea of replicating > > slab allocator internals for each memory cgroup. This approach promises > > a low memory overhead (one pointer per page), and isn't adding too much > > code on hot allocation and release paths. But is has a very serious flaw: > ^it^ > > it leads to a low slab utilization. > > > > Using a drgn* script I've got an estimation of slab utilization on > > a number of machines running different production workloads. In most > > cases it was between 45% and 65%, and the best number I've seen was > > around 85%. Turning kmem accounting off brings it to high 90s. Also > > it brings back 30-50% of slab memory. It means that the real price > > of the existing slab memory controller is way bigger than a pointer > > per page. > > > > The real reason why the existing design leads to a low slab utilization > > is simple: slab pages are used exclusively by one memory cgroup. > > If there are only few allocations of certain size made by a cgroup, > > or if some active objects (e.g. dentries) are left after the cgroup is > > deleted, or the cgroup contains a single-threaded application which is > > barely allocating any kernel objects, but does it every time on a new CPU: > > in all these cases the resulting slab utilization is very low. > > If kmem accounting is off, the kernel is able to use free space > > on slab pages for other allocations. > > In the case of slub memory allocator, it is not just unused space within > a slab. It is also the use of per-cpu slabs that can hold up a lot of > memory, especially if the tasks jump around to different cpus. The > problem is compounded if a lot of memcgs are being used. Memory > utilization can improve quite significantly if per-cpu slabs are > disabled. Of course, it comes with a performance cost. Right, but it's basically the same problem: if slabs can be used exclusively by a single memory cgroup, slab utilization is low. Per-cpu slabs are just making the problem worse by increasing the number of mostly empty slabs proportionally to the number of CPUs. With the disabled memory cgroup accounting slab utilization is quite high even with per-slabs. So the problem isn't in per-cpu slabs by themselves, they just were not designed to exist in so many copies. Thanks!