On Sat, 9 May 2020 11:32:10 +0800 Zefan Li wrote: > If systemd is configured to use hybrid mode which enables the use of > both cgroup v1 and v2, systemd will create new cgroup on both the default > root (v2) and netprio_cgroup hierarchy (v1) for a new session and attach > task to the two cgroups. If the task does some network thing then the v2 > cgroup can never be freed after the session exited. > > One of our machines ran into OOM due to this memory leak. > > In the scenario described above when sk_alloc() is called cgroup_sk_alloc() > thought it's in v2 mode, so it stores the cgroup pointer in sk->sk_cgrp_data > and increments the cgroup refcnt, but then sock_update_netprioidx() thought > it's in v1 mode, so it stores netprioidx value in sk->sk_cgrp_data, so the > cgroup refcnt will never be freed. > > Currently we do the mode switch when someone writes to the ifpriomap cgroup > control file. The easiest fix is to also do the switch when a task is attached > to a new cgroup. > > Fixes: bd1060a1d671("sock, cgroup: add sock->sk_cgroup") ^ space missing here > Reported-by: Yang Yingliang <yangyingliang@xxxxxxxxxx> > Tested-by: Yang Yingliang <yangyingliang@xxxxxxxxxx> > Signed-off-by: Zefan Li <lizefan@xxxxxxxxxx> > --- > > forgot to rebase to the latest kernel. > > --- > net/core/netprio_cgroup.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/core/netprio_cgroup.c b/net/core/netprio_cgroup.c > index 8881dd9..9bd4cab 100644 > --- a/net/core/netprio_cgroup.c > +++ b/net/core/netprio_cgroup.c > @@ -236,6 +236,8 @@ static void net_prio_attach(struct cgroup_taskset *tset) > struct task_struct *p; > struct cgroup_subsys_state *css; > > + cgroup_sk_alloc_disable(); > + > cgroup_taskset_for_each(p, css, tset) { > void *v = (void *)(unsigned long)css->id; >