On Mon, Feb 03, 2020 at 05:17:34PM -0500, Johannes Weiner wrote: > On Mon, Feb 03, 2020 at 12:58:34PM -0800, Roman Gushchin wrote: > > On Mon, Feb 03, 2020 at 02:50:48PM -0500, Johannes Weiner wrote: > > > On Mon, Jan 27, 2020 at 09:34:46AM -0800, Roman Gushchin wrote: > > > > This is fairly big but mostly red patch, which makes all non-root > > > > slab allocations use a single set of kmem_caches instead of > > > > creating a separate set for each memory cgroup. > > > > > > > > Because the number of non-root kmem_caches is now capped by the number > > > > of root kmem_caches, there is no need to shrink or destroy them > > > > prematurely. They can be perfectly destroyed together with their > > > > root counterparts. This allows to dramatically simplify the > > > > management of non-root kmem_caches and delete a ton of code. > > > > > > This is definitely going in the right direction. But it doesn't quite > > > explain why we still need two sets of kmem_caches? > > > > > > In the old scheme, we had completely separate per-cgroup caches with > > > separate slab pages. If a cgrouped process wanted to allocate a slab > > > object, we'd go to the root cache and used the cgroup id to look up > > > the right cgroup cache. On slab free we'd use page->slab_cache. > > > > > > Now we have slab pages that have a page->objcg array. Why can't all > > > allocations go through a single set of kmem caches? If an allocation > > > is coming from a cgroup and the slab page the allocator wants to use > > > doesn't have an objcg array yet, we can allocate it on the fly, no? > > > > Well, arguably it can be done, but there are few drawbacks: > > > > 1) On the release path you'll need to make some extra work even for > > root allocations: calculate the offset only to find the NULL objcg pointer. > > > > 2) There will be a memory overhead for root allocations > > (which might or might not be compensated by the increase > > of the slab utilization). > > Those two are only true if there is a wild mix of root and cgroup > allocations inside the same slab, and that doesn't really happen in > practice. Either the machine is dedicated to one workload and cgroups > are only enabled due to e.g. a vendor kernel, or you have cgrouped > systems (like most distro systems now) that cgroup everything. > > > 3) I'm working on percpu memory accounting that resembles the same scheme, > > except that obj_cgroups vector is created for the whole percpu block. > > There will be root- and memcg-blocks, and it will be expensive to merge them. > > I kinda like using the same scheme here and there. > > It's hard to conclude anything based on this information alone. If > it's truly expensive to merge them, then it warrants the additional > complexity. But I don't understand the desire to share a design for > two systems with sufficiently different constraints. > > > Upsides? > > > > 1) slab utilization might increase a little bit (but I doubt it will have > > a huge effect, because both merging sets should be relatively big and well > > utilized) > > Right. > > > 2) eliminate memcg kmem_cache dynamic creation/destruction. it's nice, > > but there isn't so much code left anyway. > > There is a lot of complexity associated with the cache cloning that > isn't the lines of code, but the lifetime and synchronization rules. > > And these two things are the primary aspects that make my head hurt > trying to review this patch series. > > > So IMO it's an interesting direction to explore, but not something > > that necessarily has to be done in the context of this patchset. > > I disagree. Instead of replacing the old coherent model and its > complexities with a new coherent one, you are mixing the two. And I > can barely understand the end result. > > Dynamically cloning entire slab caches for the sole purpose of telling > whether the pages have an obj_cgroup array or not is *completely > insane*. If the controller had followed the obj_cgroup design from the > start, nobody would have ever thought about doing it like this. Having two sets of kmem_caches has nothing to do with the refcounting and obj_cgroup abstraction. Please, take a look at the final code. Thanks!