[LSF/MM TOPIC] [BPF] Auto-detachment of bpf programs on cgroup removal

Roman Gushchin <guro@xxxxxx> · Fri, 26 Apr 2019 03:34:22 +0000

Currently a bpf programs attached to a cgroup stays attached after removal
of the cgroup. The goal of such design was to allow capturing and handling
events which are happening with a cgroup after being deleted by a user.
For example, an active socket can belong to a dying cgroup, and generally
we want to be able to handle it with cgroup-bpf programs. Some program types
(e.g. device control) have no chances to be executed after cgroup deletion.

This ability doesn't come for free, and there is a couple of related issues,
which we're facing in our production.

I. Sometimes we do accumulate a large number of dying cgroups, and if each
of them is holding few attached bpf programs with their maps, cgroup storages,
etc, it's just a large waste of memory. The reason why sometimes we do
accumulate dying cgroups is a separate topic, which is planned to be discussed
on LSF/MM.

Currently we're solving this problem by detaching programs from userspace
before cgroup removal. It usually works, but in some cases it fails
(e.g. is userspace crashes) and we end up with many loaded bpf programs
wasting the memory.

II. It makes challenging the implementation of memcg memory accounting for bpf
objects. The problem is that any charged object is holding a reference to
the corresponding memcg, memcg obviously keeps the cgroup in place,
and cgroup is holding a reference to the bpf program, which underlying memory
is potentially charged to the memcg. So it's a circular reference.

So the question I'd like to discuss is if we can detach bpf programs
automatically on cgroup removal (or at some arbitrary point afterwards),
without waiting for the whole cgroup to be released? Do we need a fine-grained
mechanism to detect if a bpf program has any chances to be executed for
every program/attach type, or there are simpler solution? Do we really
need an ability to execute bpf programs attached to dying cgroups?

Solving this problem is essential for implementing of a proper memcg-based
memory accounting for bpf objects, which can be a significant improvement
(or addition) over the existing memlock rlimit-based accounting.