On Tue, May 19, 2020 at 01:18:03PM -0700, Roman Gushchin wrote: > Percpu memory is becoming more and more widely used by various > subsystems, and the total amount of memory controlled by the percpu > allocator can make a good part of the total memory. > > As an example, bpf maps can consume a lot of percpu memory, > and they are created by a user. Also, some cgroup internals > (e.g. memory controller statistics) can be quite large. > On a machine with many CPUs and big number of cgroups they > can consume hundreds of megabytes. > > So the lack of memcg accounting is creating a breach in the memory > isolation. Similar to the slab memory, percpu memory should be > accounted by default. > > To implement the perpcu accounting it's possible to take the slab > memory accounting as a model to follow. Let's introduce two types of > percpu chunks: root and memcg. What makes memcg chunks different is > an additional space allocated to store memcg membership information. > If __GFP_ACCOUNT is passed on allocation, a memcg chunk should be be > used. If it's possible to charge the corresponding size to the target > memory cgroup, allocation is performed, and the memcg ownership data > is recorded. System-wide allocations are performed using root chunks, > so there is no additional memory overhead. > > To implement a fast reparenting of percpu memory on memcg removal, > we don't store mem_cgroup pointers directly: instead we use obj_cgroup > API, introduced for slab accounting. The overall approach makes sense to me but it'd help to have a high level comment explaining what's going on and why. Thanks. -- tejun