On Fr, 20.05.22 17:12, Lewis Gaul (lewis.gaul@xxxxxxxxx) wrote: > To summarize the questions (taken from the second post linked above): > - Why are private cgroups mounted read-only in non-privileged > containers? "private cgroups"? What do you mean by that? The controllers? Controller delegation on cgroupsv1 is simply not safe, that's all. You can provide invalid configuration to the kernel, and DoS the machine through it. cgroups are simply not a suitable privilege boundary on cgroupsv1. If you want safe delegation, use cgroupsv2, where delegation is safe. > - Is it sound to override Docker’s mounting of the private container > cgroups under v1? I don't know what Docker does these days, but they used to be entirely ignorant towards safe cooperation in the cgroup tree. i.e. they ignored https://systemd.io/CGROUP_DELEGATION in its entirety, as they don't really accepted systemd's existance. Today most distros I think switched over to other ways to run containers, i.e. podman and so on, which have a more professional approach to all this, and can safely cooperate in a cgroup tree. > - What are the concerns around the approach of passing '-v > /sys/fs/cgroup:/sys/fs/cgroup' in terms of the container’s view of its > cgroups? I don't know what this does. Is this a Docker thing? > - Is modifying/replacing the cgroup mounts set up by the container engine > a reasonable workaround, or could this be fragile? I am not sure I follow? A workaround for what? One shouldn't assume one even has the privs to modify cgroup mounts. But why would one even? > - When is it valid to manually manipulate container cgroups? When you asked for your own delegated subtree first, see docs: https://systemd.io/CGROUP_DELEGATION > - Do container managers such as Docker and Podman correctly delegate > cgroups on hosts running Systemd? podman probably does this correctly. docker didn't do, not sure if that changed. > - Are these container managers happy for the container to take ownership > of the container’s cgroup? I am not sure I grok this question, but a correctly implemented container manager should be able to safely run cgroups-using payloads inside the container. In that model, a host systemd manages the root of the tree, the container manager a cgroup further down, and the payload of the container (for example another systemd run inside the container) the stuff below. > - Why are the container’s cgroup limits not set on a parent cgroup under > Docker/Podman? I don't grok the question? > - Why doesn’t Docker use another layer of indirection in the cgroup > hierarchy such that the limit is applied in the parent cgroup to the > container? I don't understand the question. And I can't answer docker questions. > - What happens if you have two of the same cgroup mount? what do you mean by a "cgroup mount"? A cgroupfs controller mount? If they are within the same cgroup namespace they will be effectively bind mounts of each other, i.e. show the exact same contents. > - Are there any gotchas/concerns around manipulating cgroups via multiple > mount points? Why would you do that though? > - What’s the correct way to check which controllers are enabled? enabled *in* *what*? in the kernel? /proc/cgroups. Mounted? "mount" maybe? in your container mgr? depends on that. > - What is it that determines which controllers are enabled? Is it kernel > configuration applied at boot? Enabled where? > - Is it possible to have some controllers enabled for v1 at the same time > as others are enabled for v2? Yes. Lennart -- Lennart Poettering, Berlin