On Mon, Apr 19, 2021 at 8:56 AM Christian Brauner <christian.brauner@xxxxxxxxxx> wrote: > > Hey, > > It's not as dramatic as it sounds but I've been mulling a cgroup feature > for some time now which I would like to get some input on. :) > > So in container-land assuming a conservative layout where we treat a > container as a separate machine we tend to give each container a > delegated cgroup. That has already been the case with cgroup v1 and now > even more so with cgroup v2. > > So usually you will have a 1:1 mapping between container and cgroup. If > the container in addition uses a separate pid namespace then killing a > container becomes a simple kill -9 <container-init-pid> from an ancestor > pid namespace. > > However, there are quite a few scenarios where one or two of those > assumptions aren't true, i.e. there are containers that share the cgroup > with other processes on purpose that are supposed to be bound to the > lifetime of the container but are not in the same pidns of the > container. Containers that are in a delegated cgroup but share the pid > namespace with the host or other containers. > > This is just the container use-case. There are additional use-cases from > systemd services for example. > > For such scenarios it would be helpful to have a way to kill/signal all > processes in a given cgroup. > > It feels to me that conceptually this is somewhat similar to the freezer > feature. Freezer is now nicely implemented in cgroup.freeze. I would > think we could do something similar for the signal feature I'm thinking > about. So we add a file cgroup.signal which can be opened with O_RDWR > and can be used to send a signal to all processes in a given cgroup: and the descendant cgroups as well. > > int fd = open("/sys/fs/cgroup/my/delegated/cgroup", O_RDWR); > write(fd, "SIGKILL", sizeof("SIGKILL") - 1); The userspace oom-killers can also take advantage of this feature.