cgroup2 labeling status

Chris PeBenito <pebenito@xxxxxxxx> · Thu, 2 May 2024 14:37:10 -0400

The state of cgroup2 labeling and memory.pressure came up for me again. 
This was discussed March last year[1]. To summarize, refpolicy has a 
type_transition for the memory.pressure file in cgroup2 to a default of 
memory_pressure_t. For example this file:

/sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure

with the idea that we allow daemons to write to this without allowing 
writes to all cgroup_t.  Unfortunately, the thread ended and I haven't 
seen any improvement.

The conclusion was[3]:

Ah, now I remembered that we made it such that the transitions would
only apply if the parent directory has a label explicitly set by
userspace (via setxattr). Not sure if we can improve it easily, since
we can't use the normal inode-based logic for cgroupfs (the xattrs are
stored in kernfs nodes, each of which can be exposed via multiple
inodes if there is more than one cgroupfs mount).

Testing on a 6.6 kernel and systemd 255, I still see the same issues, 
where most are stuck at cgroup_t, with user.slice entries get 
memory_pressure_t[2].  Based on my investigations, the user.slice works 
because systemd sets the user.invocation_id xattr on these dirs.

Next, I tried modifying systemd to use it's version of 
setfscreatecon()+mkdir() when it creates the cgroup directories.  This 
did not change the labeling behavior.  Next I changed the code to a 
post-mkdir setfilecon() and then all the memory.pressures finally had 
expected labeling.

This setxattr() requirement is unfortunate, and the fact the 
setfscreatecon() doesn't work makes it more unfortunate.  Is there any 
improvement being worked?

[1] https://lore.kernel.org/selinux/87mt47ga29.fsf@xxxxxxxxxxx/
[2] 
https://lore.kernel.org/selinux/CAEjxPJ77ZiWTwJ=hj2DFoNCg4XZMfiU6VNSNAnyCKc0Rd+nM6Q@xxxxxxxxxxxxxx/
[3] 
https://lore.kernel.org/selinux/CAFqZXNtLFsmb3n+H=7Jcp1g_sLEFdRL75fzvjMvTU1rXvaQXMA@xxxxxxxxxxxxxx/

--
Chris PeBenito