On Thu, May 16, 2019 at 12:39:15PM +0200, Jiri Olsa wrote: > hi, > Pavel reported an issue with bpf programs (attached to cgroup) > not being released at the time when the cgroup is removed and > are still visible in 'bpftool prog' list afterwards. Hi Jiri! Can you, please, try the patch from https://github.com/rgushchin/linux/commit/f77afa1952d81a1afa6c4872d342bf6721e148e2 ? It should solve the problem, and I'm about to post it upstream. Thanks! > > It seems like this is not bpf specific, because I was able > to cut the bpf code from his example and still see delayed > release of cgroup. > > It happens only on cgroup2 fs (booted with systemd.unified_cgroup_hierarchy=1 > kernel command line option), please check the attached program > below and following scenario: > > TERM 1 > # gcc -o test test.c > > TERM 2 > # cd /sys/kernel/debug/tracing > # echo 1 > events/cgroup/cgroup_release/enable > > TERM 1 -> create and remove cgroup1 > # ./test group1 > qemu-system-x86_64: terminating on signal 15 from pid 1775 (./test) > > TERM 2 > # cat trace_pipe > <nothing> > > TERM 1 -> create and remove cgroup2 > # ./test group2 > qemu-system-x86_64: terminating on signal 15 from pid 1783 (./test) > > TERM 2 - group1 being released > # cat trace_pipe > kworker/22:2-1135 [022] .... 2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1 > > TERM 1 -> create and remove cgroup3 > # ./test group3 > qemu-system-x86_64: terminating on signal 15 from pid 1798 (./test) > > TERM 2 - group2 being released > # cat trace_pipe > kworker/22:2-1135 [022] .... 2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1 > kworker/22:0-1787 [022] .... 2961.501261: cgroup_release: root=0 id=78 level=1 path=/group2 > > > Looks like the previous cgroup release is triggered by creating > another cgroup. If I don't do anything the cgroup is released > (tracepoint shows) in about 90 seconds. > > The cgroup_release tracepoint is triggered in css_release_work_fn, > the same function where the cgroup_bpf_put is called, hence the > delay in releasing of the bpf programs. > > Is this expected or somehow configurable? It's confusing seeing > all the bpf programs from removed cgroups being around. In Pavel's > setup it's about 100 of them. > > Note, I could reproduce this only with qemu-kvm being run in child > process in the example below. > > thoughts? thanks, > jirka > > > --- > #include <fcntl.h> > #include <signal.h> > #include <stdio.h> > #include <string.h> > #include <sys/stat.h> > #include <sys/types.h> > #include <unistd.h> > > #define CGROUP_PATH "/sys/fs/cgroup" > > int > main(int argc, char **argv) > { > pid_t pid = -1; > char path[1024]; > int rc; > > pid = fork(); > > if (pid == 0) { > execl("/usr/bin/qemu-kvm", > "/usr/bin/qemu-kvm", > "-display", "none", > NULL); > fprintf(stderr, "failed to start qemu process\n"); > _exit(-1); > } else { > int filefd = -1; > char proc[1024]; > > snprintf(path, 1024, "%s/%s", CGROUP_PATH, argv[1]); > > sleep(1); > > if (mkdir(path, 0755) < 0) { > fprintf(stderr, "failed to create cgroup '%s'\n", path); > return -1; > } > > snprintf(proc, 1024, "%s/cgroup.procs", path); > > filefd = open(proc, O_WRONLY|O_TRUNC); > if (filefd > 0) { > dprintf(filefd, "%u", pid); > close(filefd); > } > > sleep(1); > } > > if (pid > 0) > kill(pid, SIGTERM); > do { > rc = rmdir(path); > } while (rc != 0); > > return 0; > }