On Thu, Aug 27, 2020 at 10:52 AM Roman Gushchin <guro@xxxxxx> wrote: > > Remote memcg charging API uses current->active_memcg to store the > currently active memory cgroup, which overwrites the memory cgroup > of the current process. It works well for normal contexts, but doesn't > work for interrupt contexts: indeed, if an interrupt occurs during > the execution of a section with an active memcg set, all allocations > inside the interrupt will be charged to the active memcg set (given > that we'll enable accounting for allocations from an interrupt > context). But because the interrupt might have no relation to the > active memcg set outside, it's obviously wrong from the accounting > prospective. > > To resolve this problem, let's add a global percpu int_active_memcg > variable, which will be used to store an active memory cgroup which > will be sued from interrupt contexts. set_active_memcg() will *used > transparently use current->active_memcg or int_active_memcg depending > on the context. > > To make the read part simple and transparent for the caller, let's > introduce two new functions: > - struct mem_cgroup *active_memcg(void), > - struct mem_cgroup *get_active_memcg(void). > > They are returning the active memcg if it's set, hiding all > implementation details: where to get it depending on the current context. > > Signed-off-by: Roman Gushchin <guro@xxxxxx> I like this patch. Internally we have a similar patch which instead of per-cpu int_active_memcg have current->active_memcg_irq. Our use-case was radix tree node allocations where we use the root node's memcg to charge all the nodes of the tree and the reason behind was that we observed a lot of zombies which were stuck due to radix tree nodes charges while the actual pages pointed by the those nodes/entries were in used by active jobs (shared file system and the kernel is older than the kmem reparenting). Reviewed-by: Shakeel Butt <shakeelb@xxxxxxxxxx>