Re: [PATCH 00/16] The new slab memory controller

Roman Gushchin <guro@xxxxxx> · Tue, 22 Oct 2019 15:59:06 +0000

On Tue, Oct 22, 2019 at 03:31:48PM +0200, Michal Hocko wrote:
> On Thu 17-10-19 17:28:04, Roman Gushchin wrote:
> > This patchset provides a new implementation of the slab memory controller,
> > which aims to reach a much better slab utilization by sharing slab pages
> > between multiple memory cgroups. Below is the short description of the new
> > design (more details in commit messages).
> > 
> > Accounting is performed per-object instead of per-page. Slab-related
> > vmstat counters are converted to bytes. Charging is performed on page-basis,
> > with rounding up and remembering leftovers.
> > 
> > Memcg ownership data is stored in a per-slab-page vector: for each slab page
> > a vector of corresponding size is allocated. To keep slab memory reparenting
> > working, instead of saving a pointer to the memory cgroup directly an
> > intermediate object is used. It's simply a pointer to a memcg (which can be
> > easily changed to the parent) with a built-in reference counter. This scheme
> > allows to reparent all allocated objects without walking them over and changing
> > memcg pointer to the parent.
> > 
> > Instead of creating an individual set of kmem_caches for each memory cgroup,
> > two global sets are used: the root set for non-accounted and root-cgroup
> > allocations and the second set for all other allocations. This allows to
> > simplify the lifetime management of individual kmem_caches: they are destroyed
> > with root counterparts. It allows to remove a good amount of code and make
> > things generally simpler.
> 
> What is the performance impact?

As I wrote, so far we haven't found any regression on any real world workload.
Of course, it's pretty easy to come up with a synthetic test which will show
some performance hit: e.g. allocate and free a large number of objects from a
single cache from a single cgroup. The reason is simple: stats and accounting
are more precise, so it requires more work. But I don't think it's a real
problem.

On the other hand I expect to see some positive effects from the significantly
reduced number of unmovable pages: memory fragmentation should become lower.
And all kernel objects will reside on a smaller number of pages, so we can
expect a better cache utilization.

> Also what is the effect on the memory
> reclaim side and the isolation. I would expect that mixing objects from
> different cgroups would have a negative/unpredictable impact on the
> memcg slab shrinking.

Slab shrinking is already working on per-object basis, so no changes here.

Quite opposite: now the freed space can be reused by other cgroups, where
previously it was often a useless operation, as nobody can reuse the space
unless all objects will be freed and the page can be returned to the page
allocator.

Thanks!