[RFC] cgroup gets release after long time

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi,
Pavel reported an issue with bpf programs (attached to cgroup)
not being released at the time when the cgroup is removed and
are still visible in 'bpftool prog' list afterwards.

It seems like this is not bpf specific, because I was able
to cut the bpf code from his example and still see delayed
release of cgroup.

It happens only on cgroup2 fs (booted with systemd.unified_cgroup_hierarchy=1
kernel command line option), please check the attached program
below and following scenario:

TERM 1
# gcc -o test test.c

			TERM 2
			# cd /sys/kernel/debug/tracing
			# echo 1 > events/cgroup/cgroup_release/enable

TERM 1 -> create and remove cgroup1
# ./test group1
qemu-system-x86_64: terminating on signal 15 from pid 1775 (./test)

			TERM 2
			# cat trace_pipe
			<nothing>

TERM 1 -> create and remove cgroup2
# ./test group2
qemu-system-x86_64: terminating on signal 15 from pid 1783 (./test)

			TERM 2  - group1 being released
			# cat trace_pipe
			kworker/22:2-1135  [022] ....  2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1

TERM 1 -> create and remove cgroup3
# ./test group3
qemu-system-x86_64: terminating on signal 15 from pid 1798 (./test)

			TERM 2 - group2 being released
			# cat trace_pipe
			kworker/22:2-1135  [022] ....  2947.375526: cgroup_release: root=0 id=78 level=1 path=/group1
			kworker/22:0-1787  [022] ....  2961.501261: cgroup_release: root=0 id=78 level=1 path=/group2


Looks like the previous cgroup release is triggered by creating
another cgroup.  If I don't do anything the cgroup is released
(tracepoint shows) in about 90 seconds.

The cgroup_release tracepoint is triggered in css_release_work_fn,
the same function where the cgroup_bpf_put is called, hence the
delay in releasing of the bpf programs.

Is this expected or somehow configurable? It's confusing seeing
all the bpf programs from removed cgroups being around. In Pavel's
setup it's about 100 of them.

Note, I could reproduce this only with qemu-kvm being run in child
process in the example below.

thoughts? thanks,
jirka


---
#include <fcntl.h>
#include <signal.h>
#include <stdio.h>
#include <string.h>
#include <sys/stat.h>
#include <sys/types.h>
#include <unistd.h>

#define CGROUP_PATH "/sys/fs/cgroup"

int
main(int argc, char **argv)
{
	pid_t pid = -1;
	char path[1024];
	int rc;

	pid = fork();

	if (pid == 0) {
		execl("/usr/bin/qemu-kvm",
		      "/usr/bin/qemu-kvm",
		      "-display", "none",
		      NULL);
		fprintf(stderr, "failed to start qemu process\n");
		_exit(-1);
	} else {
		int filefd = -1;
		char proc[1024];

		snprintf(path, 1024, "%s/%s", CGROUP_PATH, argv[1]);

		sleep(1);

		if (mkdir(path, 0755) < 0) {
			fprintf(stderr, "failed to create cgroup '%s'\n", path);
			return -1;
		}

		snprintf(proc, 1024, "%s/cgroup.procs", path);

		filefd = open(proc, O_WRONLY|O_TRUNC);
		if (filefd > 0) {
			dprintf(filefd, "%u", pid);
			close(filefd);
		}

		sleep(1);
	}

	if (pid > 0)
		kill(pid, SIGTERM);
	do {
		rc = rmdir(path);
	} while (rc != 0);

	return 0;
}



[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux