The state of cgroup2 labeling and memory.pressure came up for me again.
This was discussed March last year[1]. To summarize, refpolicy has a
type_transition for the memory.pressure file in cgroup2 to a default of
memory_pressure_t. For example this file:
/sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
with the idea that we allow daemons to write to this without allowing
writes to all cgroup_t. Unfortunately, the thread ended and I haven't
seen any improvement.
The conclusion was[3]:
Ah, now I remembered that we made it such that the transitions would
only apply if the parent directory has a label explicitly set by
userspace (via setxattr). Not sure if we can improve it easily, since
we can't use the normal inode-based logic for cgroupfs (the xattrs are
stored in kernfs nodes, each of which can be exposed via multiple
inodes if there is more than one cgroupfs mount).
Testing on a 6.6 kernel and systemd 255, I still see the same issues,
where most are stuck at cgroup_t, with user.slice entries get
memory_pressure_t[2]. Based on my investigations, the user.slice works
because systemd sets the user.invocation_id xattr on these dirs.
Next, I tried modifying systemd to use it's version of
setfscreatecon()+mkdir() when it creates the cgroup directories. This
did not change the labeling behavior. Next I changed the code to a
post-mkdir setfilecon() and then all the memory.pressures finally had
expected labeling.
This setxattr() requirement is unfortunate, and the fact the
setfscreatecon() doesn't work makes it more unfortunate. Is there any
improvement being worked?
[1] https://lore.kernel.org/selinux/87mt47ga29.fsf@xxxxxxxxxxx/
[2]
https://lore.kernel.org/selinux/CAEjxPJ77ZiWTwJ=hj2DFoNCg4XZMfiU6VNSNAnyCKc0Rd+nM6Q@xxxxxxxxxxxxxx/
[3]
https://lore.kernel.org/selinux/CAFqZXNtLFsmb3n+H=7Jcp1g_sLEFdRL75fzvjMvTU1rXvaQXMA@xxxxxxxxxxxxxx/
--
Chris PeBenito