On 5/22/19 4:20 PM, Roman Gushchin wrote: > Currently the lifetime of bpf programs attached to a cgroup is bound > to the lifetime of the cgroup itself. It means that if a user > forgets (or intentionally avoids) to detach a bpf program before > removing the cgroup, it will stay attached up to the release of the > cgroup. Since the cgroup can stay in the dying state (the state > between being rmdir()'ed and being released) for a very long time, it > leads to a waste of memory. Also, it blocks a possibility to implement > the memcg-based memory accounting for bpf objects, because a circular > reference dependency will occur. Charged memory pages are pinning the > corresponding memory cgroup, and if the memory cgroup is pinning > the attached bpf program, nothing will be ever released. > > A dying cgroup can not contain any processes, so the only chance for > an attached bpf program to be executed is a live socket associated > with the cgroup. So in order to release all bpf data early, let's > count associated sockets using a new percpu refcounter. On cgroup > removal the counter is transitioned to the atomic mode, and as soon > as it reaches 0, all bpf programs are detached. > > The reference counter is not socket specific, and can be used for any > other types of programs, which can be executed from a cgroup-bpf hook > outside of the process context, had such a need arise in the future. > > Signed-off-by: Roman Gushchin <guro@xxxxxx> > Cc: jolsa@xxxxxxxxxx The logic looks sound to me. With one nit below, Acked-by: Yonghong Song <yhs@xxxxxx> > --- > include/linux/bpf-cgroup.h | 8 ++++++-- > include/linux/cgroup.h | 18 ++++++++++++++++++ > kernel/bpf/cgroup.c | 25 ++++++++++++++++++++++--- > kernel/cgroup/cgroup.c | 11 ++++++++--- > 4 files changed, 54 insertions(+), 8 deletions(-) > [...] > @@ -167,7 +178,12 @@ int cgroup_bpf_inherit(struct cgroup *cgrp) > */ > #define NR ARRAY_SIZE(cgrp->bpf.effective) > struct bpf_prog_array __rcu *arrays[NR] = {}; > - int i; > + int ret, i; > + > + ret = percpu_ref_init(&cgrp->bpf.refcnt, cgroup_bpf_release, 0, > + GFP_KERNEL); > + if (ret) > + return -ENOMEM; Maybe return "ret" here instead of -ENOMEM. Currently, percpu_ref_init only return error code is -ENOMEM. But in the future, it could change? > > for (i = 0; i < NR; i++) > INIT_LIST_HEAD(&cgrp->bpf.progs[i]); > @@ -183,6 +199,9 @@ int cgroup_bpf_inherit(struct cgroup *cgrp) > cleanup: > for (i = 0; i < NR; i++) > bpf_prog_array_free(arrays[i]); > + > + percpu_ref_exit(&cgrp->bpf.refcnt); > + > return -ENOMEM; > } > [...]