I'm aware of the higher level of collaboration between podman and systemd compared to docker, hence primarily raising this issue from a podman angle.
In privileged mode all mounts are read-write, so yes the container has write access to the cgroup filesystem. (Podman also ensures write access to the systemd cgroup subsystem mount in non-privileged mode by default).
On first boot PID 1 can be found in /sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/init.scope/cgroup.procs, whereas when the container restarts the 'init.scope/' directory does not exist and PID 1 is instead found in the parent (container root) cgroup /sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/cgroup.procs (also reflected by /proc/1/cgroup). This is strange because systemd must be the one to create this cgroup dir in the initial boot, so I'm not sure why it wouldn't on subsequent boot?
I can confirm that the container has permissions since executing a 'mkdir' in /sys/fs/cgroup/systemd/machine.slice/libpod-<ctr-id>.scope/ inside the container succeeds after the restart, so I have no idea why systemd is not creating the 'init.scope/' dir. I notice that inside the container's systemd cgroup mount 'system.slice/' does exist, but 'user.slice/' also does not (both exist on normal boot). Is there any way I can find systemd logs that might indicate why the cgroup dir creation is failing?
One final datapoint: the same is seen when using a private cgroup namespace (via 'podman run --cgroupns=private'), although then the error is then, as expected, "Failed to attach 1 to compat systemd cgroup /init.scope: No such file or directory".
I could raise this with the podman team, but it seems more in the systemd area given it's a systemd warning and I would expect systemd to be creating this cgroup dir?
Thanks,
Lewis
On Tue, 10 Jan 2023 at 14:48, Lennart Poettering <lennart@xxxxxxxxxxxxxx> wrote:
On Di, 10.01.23 13:18, Lewis Gaul (lewis.gaul@xxxxxxxxx) wrote:
> Following 'setenforce 0' I still see the same issue (I was also suspecting
> SELinux!).
>
> A few additional data points:
> - this was not seen when using systemd v230 inside the container
> - this is also seen on CentOS 8.4
> - this is seen under docker even if the container's cgroup driver is
> changed from 'cgroupfs' to 'systemd'
docker is garbage. They are hostile towards running systemd inside
containers.
podman upstream is a lot friendly, and apparently what everyone in OCI
is going towards these days.
I have not much experience with podman though, and in particular not
old versions. Next step would probably be to look at what precisely
causes the permission issue, via strace.
but did you make sure your container actually gets write access to the
cgroup trees?
anyway, i'd recommend asking the podman community for help about this.
Lennart
--
Lennart Poettering, Berlin