On 5/3/2024 8:00 AM, Stephen Smalley wrote:
On Thu, May 2, 2024 at 3:16 PM Chris PeBenito <pebenito@xxxxxxxx> wrote:
On 5/2/2024 2:53 PM, Stephen Smalley wrote:
On Thu, May 2, 2024 at 2:37 PM Chris PeBenito <pebenito@xxxxxxxx> wrote:
The state of cgroup2 labeling and memory.pressure came up for me again.
This was discussed March last year[1]. To summarize, refpolicy has a
type_transition for the memory.pressure file in cgroup2 to a default of
memory_pressure_t. For example this file:
/sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
with the idea that we allow daemons to write to this without allowing
writes to all cgroup_t. Unfortunately, the thread ended and I haven't
seen any improvement.
The conclusion was[3]:
Ah, now I remembered that we made it such that the transitions would
only apply if the parent directory has a label explicitly set by
userspace (via setxattr). Not sure if we can improve it easily, since
we can't use the normal inode-based logic for cgroupfs (the xattrs are
stored in kernfs nodes, each of which can be exposed via multiple
inodes if there is more than one cgroupfs mount).
Testing on a 6.6 kernel and systemd 255, I still see the same issues,
where most are stuck at cgroup_t, with user.slice entries get
memory_pressure_t[2]. Based on my investigations, the user.slice works
because systemd sets the user.invocation_id xattr on these dirs.
Next, I tried modifying systemd to use it's version of
setfscreatecon()+mkdir() when it creates the cgroup directories. This
did not change the labeling behavior. Next I changed the code to a
post-mkdir setfilecon() and then all the memory.pressures finally had
expected labeling.
This setxattr() requirement is unfortunate, and the fact the
setfscreatecon() doesn't work makes it more unfortunate. Is there any
improvement being worked?
Possibly I misunderstand, but selinux_kernfs_init_security() appears
to honor the create_sid (setfscreatecon) if set, so I would have
expected that to work.
Does there need to be an xattr on the cgroup2 fs root directory for this
to work? Based on the tracing I did on the systemd code, the post-mkdir
setfilecon() would have happened on the root dir, but the
setfscreatcon() version of the code change obviously wouldn't have
changed anything when it ran on the cgroup2 root dir.
That could be the case, based on Ondrej's statement on the earlier
thread. So it isn't a limitation of the SELinux code per se but rather
the cgroup2/kernfs code.
I think I've reached the end of what I can debug from userspace. I
changed the genfscon to no_access_t so it would be obvious where the
genfs label was still in use on a file. It indicated a relabel is
needed due to entries being created during initramfs, despite systemd
supposedly relabeling /sys/fs/cgroup (still looking into this) just
after loading the policy. I added a tmpfiles.d entry to get the fs
relabeled and retried the setfscreatecon() version and the results were
quite weird:
root [ /home/pebenito ]# ls -lZ /sys/fs/cgroup/*/*/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:21
/sys/fs/cgroup/system.slice/auditd.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:memory_pressure_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/boot-efi.mount/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/chronyd.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/crond.service/memory.pressure
-rw-r--r--. 1 messagebus messagebus
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/dbus.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/hypervkvpd.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/hypervvssd.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/irqbalance.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:cgroup_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/sshd.service/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:memory_pressure_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/sysroot.mount/memory.pressure
-rw-r--r--. 1 root root
system_u:object_r:memory_pressure_t:s0 0 May 6 12:19
/sys/fs/cgroup/system.slice/system-getty.slice/memory.pressure
[...]
In case it was due to entries created in the initramfs, I tried
restarting auditd and still got cgroup_t on the memory.pressure. I
added a type_transition for all domains, but still get cgroup_t. I
can't explain why some memory.pressures would get the expected label,
but others not.
--
Chris PeBenito