On 7/20/21 10:26 PM, Shakeel Butt wrote: > On Mon, Jul 19, 2021 at 3:44 AM Vasily Averin <vvs@xxxxxxxxxxxxx> wrote: >> >> An netadmin inside container can use 'ip a a' and 'ip r a' >> to assign a large number of ipv4/ipv6 addresses and routing entries >> and force kernel to allocate megabytes of unaccounted memory >> for long-lived per-netdevice related kernel objects: >> 'struct in_ifaddr', 'struct inet6_ifaddr', 'struct fib6_node', >> 'struct rt6_info', 'struct fib_rules' and ip_fib caches. >> >> These objects can be manually removed, though usually they lives >> in memory till destroy of its net namespace. >> >> It makes sense to account for them to restrict the host's memory >> consumption from inside the memcg-limited container. >> >> One of such objects is the 'struct fib6_node' mostly allocated in >> net/ipv6/route.c::__ip6_ins_rt() inside the lock_bh()/unlock_bh() section: >> >> write_lock_bh(&table->tb6_lock); >> err = fib6_add(&table->tb6_root, rt, info, mxc); >> write_unlock_bh(&table->tb6_lock); >> >> In this case it is not enough to simply add SLAB_ACCOUNT to corresponding >> kmem cache. The proper memory cgroup still cannot be found due to the >> incorrect 'in_interrupt()' check used in memcg_kmem_bypass(). >> >> Obsoleted in_interrupt() does not describe real execution context properly. >> From include/linux/preempt.h: >> >> The following macros are deprecated and should not be used in new code: >> in_interrupt() - We're in NMI,IRQ,SoftIRQ context or have BH disabled >> >> To verify the current execution context new macro should be used instead: >> in_task() - We're in task context >> >> Signed-off-by: Vasily Averin <vvs@xxxxxxxxxxxxx> >> --- >> mm/memcontrol.c | 2 +- >> net/core/fib_rules.c | 4 ++-- >> net/ipv4/devinet.c | 2 +- >> net/ipv4/fib_trie.c | 4 ++-- >> net/ipv6/addrconf.c | 2 +- >> net/ipv6/ip6_fib.c | 4 ++-- >> net/ipv6/route.c | 2 +- >> 7 files changed, 10 insertions(+), 10 deletions(-) >> >> diff --git a/mm/memcontrol.c b/mm/memcontrol.c >> index ae1f5d0..1bbf239 100644 >> --- a/mm/memcontrol.c >> +++ b/mm/memcontrol.c >> @@ -968,7 +968,7 @@ static __always_inline bool memcg_kmem_bypass(void) >> return false; >> >> /* Memcg to charge can't be determined. */ >> - if (in_interrupt() || !current->mm || (current->flags & PF_KTHREAD)) >> + if (!in_task() || !current->mm || (current->flags & PF_KTHREAD)) >> return true; >> >> return false; > > Can you please also change in_interrupt() in active_memcg() as well? > There are other unrelated in_interrupt() in that file but the one in > active_memcg() should be coupled with this change. Could you please elaborate? >From my point of view active_memcg is paired with set_active_memcg() and is not related to this case. active_memcg uses memcg that was set by set_active_memcg(), either from int_active_memcg per-cpu pointer or from current->active_memcg pointer. I'm agree, it in case of disabled BH it is incorrect to use int_active_memcg, we still can use current->active_memcg. However it isn't a problem, memcg will be properly provided in both cases. I think it's better to fix set_active_memcg/active_memcg by separate patch. Am I missed something perhaps? Thank you, Vasily Averin