On Thu, May 16, 2019 at 3:39 AM Jiri Olsa <jolsa@xxxxxxxxxx> wrote: > > hi, > Pavel reported an issue with bpf programs (attached to cgroup) > not being released at the time when the cgroup is removed and > are still visible in 'bpftool prog' list afterwards. right. the workaround systemd and others are using today is to detach bpf prog before rmdir of cgroup. Roman has patches to do this automatically. > It seems like this is not bpf specific, because I was able > to cut the bpf code from his example and still see delayed > release of cgroup. > > It happens only on cgroup2 fs (booted with systemd.unified_cgroup_hierarchy=1 > kernel command line option), please check the attached program > below and following scenario: > > TERM 1 > # gcc -o test test.c > > TERM 2 > # cd /sys/kernel/debug/tracing > # echo 1 > events/cgroup/cgroup_release/enable > > TERM 1 -> create and remove cgroup1 > # ./test group1 > qemu-system-x86_64: terminating on signal 15 from pid 1775 (./test) > > TERM 2 > # cat trace_pipe > <nothing> > > TERM 1 -> create and remove cgroup2 > # ./test group2 > qemu-system-x86_64: terminating on signal 15 from pid 1783 (./test) > > TERM 2 - group1 being released > # cat trace_pipe > kworker/22:2-1135 [022] .... 2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1 > > TERM 1 -> create and remove cgroup3 > # ./test group3 > qemu-system-x86_64: terminating on signal 15 from pid 1798 (./test) > > TERM 2 - group2 being released > # cat trace_pipe > kworker/22:2-1135 [022] .... 2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1 > kworker/22:0-1787 [022] .... 2961.501261: cgroup_release: root=0 id=78 level=1 path=/group2 > > > Looks like the previous cgroup release is triggered by creating > another cgroup. If I don't do anything the cgroup is released > (tracepoint shows) in about 90 seconds. > > The cgroup_release tracepoint is triggered in css_release_work_fn, > the same function where the cgroup_bpf_put is called, hence the > delay in releasing of the bpf programs. > > Is this expected or somehow configurable? It's confusing seeing > all the bpf programs from removed cgroups being around. In Pavel's > setup it's about 100 of them. > > Note, I could reproduce this only with qemu-kvm being run in child > process in the example below. > > thoughts? thanks, > jirka > > > --- > #include <fcntl.h> > #include <signal.h> > #include <stdio.h> > #include <string.h> > #include <sys/stat.h> > #include <sys/types.h> > #include <unistd.h> > > #define CGROUP_PATH "/sys/fs/cgroup" > > int > main(int argc, char **argv) > { > pid_t pid = -1; > char path[1024]; > int rc; > > pid = fork(); > > if (pid == 0) { > execl("/usr/bin/qemu-kvm", > "/usr/bin/qemu-kvm", > "-display", "none", > NULL); > fprintf(stderr, "failed to start qemu process\n"); > _exit(-1); > } else { > int filefd = -1; > char proc[1024]; > > snprintf(path, 1024, "%s/%s", CGROUP_PATH, argv[1]); > > sleep(1); > > if (mkdir(path, 0755) < 0) { > fprintf(stderr, "failed to create cgroup '%s'\n", path); > return -1; > } > > snprintf(proc, 1024, "%s/cgroup.procs", path); > > filefd = open(proc, O_WRONLY|O_TRUNC); > if (filefd > 0) { > dprintf(filefd, "%u", pid); > close(filefd); > } > > sleep(1); > } > > if (pid > 0) > kill(pid, SIGTERM); > do { > rc = rmdir(path); > } while (rc != 0); > > return 0; > }