Re: Feedback sought: can we drop cgroupv1 support soon?

Lewis Gaul <lewis.gaul@xxxxxxxxx> · Fri, 18 Aug 2023 11:15:44 +0100

> What's stopping you from mounting a private "named" cgroup v1
> hierarchy to such containers (i.e. no controllers). systemd will then
> use that when taking over and not bother with mounting anything on its
> own, such as a cgroupv2 tree.
We specifically want to be able to make use of cgroup controllers within the container. One example of this would be to use "MemoryLimit" (cgroupv1) for a systemd unit (I understand this is deprecated in the latest versions of systemd, but as far as I can see we wouldn't be able to use the cgroupv2 "MemoryMax" config in this scenario anyway).

> You are doing something half broken and
> outside of the intended model already, I am not sure we need to go the
> extra mile to support this for longer.
I'm slightly surprised and disheartened by this viewpoint. I have paid close attention to https://systemd.io/CONTAINER_INTERFACE/ and https://systemd.io/CGROUP_DELEGATION/, and I'd interpreted the statement as being that running systemd in a container should be fully supported (not only on cgroupsv2, at least using recent-but-not-latest systemd versions).

In particular, the following:

"Note that it is our intention to make systemd systems work flawlessly and
out-of-the-box in containers. In fact, we are interested to ensure that the same
OS image can be booted on a bare system, in a VM and in a container, and behave
correctly each time. If you notice that some component in systemd does not work
in a container as it should, even though the container manager implements
everything documented above, please contact us."

"When systemd runs as container payload it will make use of all hierarchies it
has write access to. For legacy mode you need to make at least
/sys/fs/cgroup/systemd/ available, all other hierarchies are optional."

I note that point 6 under "Some Don'ts" does correlate with what you're saying:
"Think twice before delegating cgroup v1 controllers to less privileged
containers. It’s not safe, you basically allow your containers to freeze the
system with that and worse."
However, in our case we're talking about a privileged container, so this doesn't really apply.

I think there's a definite use-case here, and unfortunately when systemd drops support for cgroupsv1 I think this will just mean we'll be unable to upgrade the container's systemd version until all relevant hosts use cgroupsv2 by default (probably a couple of years away).

Thanks for your time,
Lewis

On Mon, 7 Aug 2023 at 17:26, Lennart Poettering <lennart@xxxxxxxxxxxxxx> wrote:
On Do, 20.07.23 01:59, Dimitri John Ledkov (dimitri.ledkov@xxxxxxxxxxxxx) wrote:

> Some deployments that switch back their modern v2 host to hybrid or v1, are

> the ones that need to run old workloads that contain old systemd. Said old

> systemd only has experimental incomplete v2 support that doesn't work with

> v2-only (the one before current stable magick mount value).

What's stopping you from mounting a private "named" cgroup v1

hierarchy to such containers (i.e. no controllers). systemd will then

use that when taking over and not bother with mounting anything on its

own, such as a cgroupv2 tree.

that should be enough to make old systemd happy.

Lennart

--

Lennart Poettering, Berlin