Hi Lennart, all,
TL;DR: A container making use of cgroup controllers must use the same cgroup version as the host, and in the case of it being a systemd container on an arbitrary host then a lack of cgroup v1 support from systemd would place a cgroup v2 requirement on the host, which is an undesirable property of a container.
I can totally understand the desire to simplify the codebase/support matrix, and appreciate this response is coming quite late (almost a year since cgroups v1 was noted as a future deprecation in systemd). However, I wanted to share a use-case/argument for keeping cgroups v1 support a little longer in case it may impact the decision at all.
At my $work we provide a container image to customers, where the container runs using systemd as the init system. The end-user has some freedom on how/where to run this container, e.g. using docker/podman on a host of their choice, or in Kubernetes (e.g. EKS in AWS).
Of course there are bounds on what we officially support, but generally we would like to support recent LTS releases of major distros, currently including Ubuntu 20.04, Ubuntu 22.04, RHEL 8, RHEL 9, Amazon Linux 2 (EKS doesn’t yet support Amazon Linux 2023). Of these, only Ubuntu 22.04 and RHEL 9 have switched to using cgroups v2 by default, and we are not in a position to require the end-user to reconfigure their host to enable running our container. What’s more, since we make use of cgroup controllers inside the container, we cannot have cgroup v1 controllers enabled on the host while attempting to use cgroups v2 inside the container.
TL;DR: A container making use of cgroup controllers must use the same cgroup version as the host, and in the case of it being a systemd container on an arbitrary host then a lack of cgroup v1 support from systemd would place a cgroup v2 requirement on the host, which is an undesirable property of a container.
I can totally understand the desire to simplify the codebase/support matrix, and appreciate this response is coming quite late (almost a year since cgroups v1 was noted as a future deprecation in systemd). However, I wanted to share a use-case/argument for keeping cgroups v1 support a little longer in case it may impact the decision at all.
At my $work we provide a container image to customers, where the container runs using systemd as the init system. The end-user has some freedom on how/where to run this container, e.g. using docker/podman on a host of their choice, or in Kubernetes (e.g. EKS in AWS).
Of course there are bounds on what we officially support, but generally we would like to support recent LTS releases of major distros, currently including Ubuntu 20.04, Ubuntu 22.04, RHEL 8, RHEL 9, Amazon Linux 2 (EKS doesn’t yet support Amazon Linux 2023). Of these, only Ubuntu 22.04 and RHEL 9 have switched to using cgroups v2 by default, and we are not in a position to require the end-user to reconfigure their host to enable running our container. What’s more, since we make use of cgroup controllers inside the container, we cannot have cgroup v1 controllers enabled on the host while attempting to use cgroups v2 inside the container.
> Because of that I see no reason why old systemd cgroupv1 payloads
> shouldn#t just work on cgroupv2 hosts: as long as you give them a
> pre-set-up cgroupv1 environemnt, and nothing stops you from doing
> that. In fact, this is something we even documented somewhere: what to
> do if the host only does a subset of the cgroup stuff you want, and
> what you have to do to set up the other stuff (i.e. if host doesn't
> manage your hierarchy of choice, but only others, just follow the same
> structure in the other hierarchy, and clean up after yourself). This
> is what nspawn does: if host is cgroupv2 only it will set up
> name=systemd hierarchy in cgroupv1 itself, and pass that to the
> container.
> shouldn#t just work on cgroupv2 hosts: as long as you give them a
> pre-set-up cgroupv1 environemnt, and nothing stops you from doing
> that. In fact, this is something we even documented somewhere: what to
> do if the host only does a subset of the cgroup stuff you want, and
> what you have to do to set up the other stuff (i.e. if host doesn't
> manage your hierarchy of choice, but only others, just follow the same
> structure in the other hierarchy, and clean up after yourself). This
> is what nspawn does: if host is cgroupv2 only it will set up
> name=systemd hierarchy in cgroupv1 itself, and pass that to the
> container.
I don't think this works for us since we need the full cgroup (v1/v2) filesystem available in the container, with controllers enabled.
This means that we must, for now, continue to support cgroups v1 in our container image. If systemd were to drop support for cgroups v1 then we may find ourselves in an awkward position of not being able to upgrade to this new systemd version, or be forced to pass this restriction on to end-users. The reason we’re uncomfortable about insisting on the use of cgroups v2 is that as a container app we ideally wouldn’t place such requirements on the host.
So, while it's true that the container ecosystem does now largely support cgroups v2, there is still an aspect of caring about what the host is running, which from our perspective this should be assumed to be the default configuration for the chosen distro. With this in mind, we’d ideally like to have systemd support cgroups v1 a little longer than the end of this year.
Does this make sense as a use-case and motivation for wanting new systemd versions to continue supporting cgroups v1? Of course not forever, but until there are less hosts out there using cgroups v1.
Best wishes,
Lewis
This means that we must, for now, continue to support cgroups v1 in our container image. If systemd were to drop support for cgroups v1 then we may find ourselves in an awkward position of not being able to upgrade to this new systemd version, or be forced to pass this restriction on to end-users. The reason we’re uncomfortable about insisting on the use of cgroups v2 is that as a container app we ideally wouldn’t place such requirements on the host.
So, while it's true that the container ecosystem does now largely support cgroups v2, there is still an aspect of caring about what the host is running, which from our perspective this should be assumed to be the default configuration for the chosen distro. With this in mind, we’d ideally like to have systemd support cgroups v1 a little longer than the end of this year.
Does this make sense as a use-case and motivation for wanting new systemd versions to continue supporting cgroups v1? Of course not forever, but until there are less hosts out there using cgroups v1.
Best wishes,
Lewis
On Fri, 22 Jul 2022 at 11:15, Lennart Poettering <mzerqung@xxxxxxxxxxx> wrote:
On Do, 21.07.22 16:24, Stéphane Graber (stgraber@xxxxxxxxxx) wrote:
> Hey there,
>
> I believe Christian may have relayed some of this already but on my
> side, as much as I can sympathize with the annoyance of having to
> support both cgroup1 and cgroup2 side by side, I feel that we're sadly
> nowhere near the cut off point.
>
> >From what I can gather from various stats we have, over 90% of LXD
> users are still on distributions relying on CGroup1.
> That's because most of them are using LTS releases of server
> distributions and those only somewhat recently made the jump to
> cgroup2:
> - RHEL 9 in May 2022
> - Ubuntu 22.04 LTS in April 2022
> - Debian 11 in August 2021
>
> OpenSUSE is still on cgroup1 by default in 15.4 for some reason.
> All this is also excluding our two largest users, Chromebooks and QNAP
> NASes, neither of them made the switch yet.
At some point I feel no sympathy there. If google/qnap/suse still are
stuck in cgroupv1 land, then that's on them, we shouldn't allow
ourselves to be held hostage by that.
I mean, that Google isn't forward looking in these things is well
known, but I am a bit surprised SUSE is still so far back.
> I honestly wouldn't be holding deprecating cgroup1 on waiting for
> those few to wake up and transition.
> Both ChromeOS and QNAP can very quickly roll it out to all their users
> should they want to.
> It's a bit trickier for OpenSUSE as it's used as the basis for SLES
> and so those enterprise users are unlikely to see cgroup2 any time
> soon.
>
> Now all of this is a problem because:
> - Our users are slow to upgrade. It's common for them to skip an
> entire LTS release and those that upgrade every time will usually wait
> 6 months to a year prior to upgrading to a new release.
> - This deprecation would prevent users of anything but the most
> recent release from running any newer containers. As it's common to
> switch to newer containers before upgrading the host, this would cause
> some issues.
> - Unfortunately the reverse is a problem too. RHEL 7 and derivatives
> are still very common as a container workload, as is Ubuntu 16.04 LTS.
> Unfortunately those releases ship with a systemd version that does not
> boot under cgroup2.
Hmm, cgroupv1 named hiearchies should still be available even on
cgroupv2 hosts. I am pretty sure nspawn at least should have no
problem with running old cgroupv1 payloads on a cgroupv2 host.
Isn't this issue just an artifact of the fact that LXD doesn't
pre-mount cgroupfs? Or does it do so these days? because systemd's
PID1 since time began would just use the cgroup setup it finds itself
in if it's already mounted/set up. And only mount and make a choice
between cgroup1 or cgroupv2 if there's really nothing set up so far.
Because of that I see no reason why old systemd cgroupv1 payloads
shouldn#t just work on cgroupv2 hosts: as long as you give them a
pre-set-up cgroupv1 environemnt, and nothing stops you from doing
that. In fact, this is something we even documented somewhere: what to
do if the host only does a subset of the cgroup stuff you want, and
what you have to do to set up the other stuff (i.e. if host doesn't
manage your hierarchy of choice, but only others, just follow the same
structure in the other hierarchy, and clean up after yourself). This
is what nspawn does: if host is cgroupv2 only it will set up
name=systemd hierarchy in cgroupv1 itself, and pass that to the
container.
(I mean, we might have regressed on this, since i guess this kind of
setup is not as well tested with nspawn, but I distinctly remember
that I wrote that stuff once upon a time, and it worked fine then.)
> That last issue has been biting us a bit recently but it's something
> that one can currently workaround by forcing systemd back into hybrid
> mode on the host.
This should not be necessary, if LXD would do minimal cgroup setup on
its own.
> With the deprecation of cgroup1, this won't be possible anymore. You
> simply won't be able to have both CentOS7 and Fedora XYZ running in
> containers on the same system as one will only work on cgroup1 and the
> other only on cgroup2.
I am pretty sure this works fine with nspawn...
> I guess that would mean holding on to cgroup1 support until EOY 2023
> or thereabout?
That does sound OK to me. We can mark it deprecated before though,
i.e. generate warnings, and remove it from docs, as long as the actual
code stays around until then.
Thank you, for the input,
Lennart
--
Lennart Poettering, Berlin