Re: cgroup2 labeling question

Dominick Grift <dominick.grift@xxxxxxxxxxx> · Mon, 20 Mar 2023 16:23:13 +0100

Stephen Smalley <stephen.smalley.work@xxxxxxxxx> writes:

> On Mon, Mar 20, 2023 at 10:46 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
>>
>> On Mon, Mar 20, 2023 at 3:19 PM Dominick Grift
>> <dominick.grift@xxxxxxxxxxx> wrote:
>> >
>> > Ondrej Mosnacek <omosnace@xxxxxxxxxx> writes:
>> >
>> > > On Mon, Mar 20, 2023 at 2:59 PM Dominick Grift
>> > > <dominick.grift@xxxxxxxxxxx> wrote:
>> > >>
>> > >> Stephen Smalley <stephen.smalley.work@xxxxxxxxx> writes:
>> > >>
>> > >> > On Mon, Mar 20, 2023 at 3:25 AM Dominick Grift
>> > >> > <dominick.grift@xxxxxxxxxxx> wrote:
>> > >> >>
>> > >> >>
>> > >> >> Hi,
>> > >> >>
>> > >> >> I was reading this pull request [1] and looked into how I might be able
>> > >> >> to implement this in policy but there seem to be some technical
>> > >> >> difficulties.
>> > >> >>
>> > >> >> * I already use getfscon to seperate the systemd user.slice because the
>> > >> >>   system manager delegates the user.slice to the user manager.
>> > >> >>
>> > >> >>   (genfscon "cgroup2" "/user.slice" cgroupfile_context)
>> > >> >>
>> > >> >>   In the past the proved to be a racy where systemd attempts to
>> > >> >>   write before the object has the context associated with the genfscon.
>> > >> >
>> > >> > I don't understand how this could be racy - genfscon-assigned contexts
>> > >> > should be assigned when the dentry is first instantiated via
>> > >> > inode_donit_with_dentry and therefore the inode shouldn't be
>> > >> > accessible to userspace prior to this initial assignment AFAIK.
>> > >> > Possibly I am missing something.
>> > >>
>> > >> I recall encountering this sporadically, but I admit that it has been a
>> > >> while since I supressed it in policy. I might try to reproduce. AFAIK my
>> > >> policy is the only policy that actually labels some trees on cgroup2 fs
>> > >> with private types currently.
>> > >>
>> > >> >
>> > >> >>   I decided to dontaudit attempts to write to the mislabeled object and
>> > >> >>   it *seems* as if systemd retries until it can write it i.e. when the
>> > >> >>   object carries the expected label and so that seems to work eventually
>> > >> >>   but it looks fragile.
>> > >> >>
>> > >> >> * The challenge with memory pressure implementation [2] is that these
>> > >> >>   "memory.pressure" files end up in random locations under
>> > >> >>   "/system.slice" for example:
>> > >> >>
>> > >> >>   /sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
>> > >> >>
>> > >> >>   Where in the above systemd-journald.service might be
>> > >> >>   templated (systemd-journald@FOO.service). Point is that the path is
>> > >> >>   random. genfscon does not support regex and glob. I can't do for example:
>> > >> >>
>> > >> >>   (genfscon "cgroup2" "/system.slice/.*/memory.pressure"
>> > >> >>   cgroupfile_context)
>> > >> >>
>> > >> >>   Fortunately cgroup2fs supports relabeling but if systemd has to
>> > >> >>   manually relabel the cgroup files then I would imagine that this is
>> > >> >>   racy as well, and that does not really solve the underlying issue.
>> > >> >>
>> > >> >>   I am looking for ideas and suggestions
>> > >> >
>> > >> > Optimally one of two things would happen:
>> > >> > 1. The kernel would label the inode correctly when it is first created
>> > >> > (e.g. by augmenting genfscon to support more general matching), or
>> > >> > 2. The userspace component that creates these files would label them
>> > >> > correctly at creation (via setfscreatecon() prior to creation).
>> > >>
>> > >> Agree but 1. would require regex/glob support for genfscon and 2. these
>> > >> files aren't "created" by userspace AFAIK and so setfscreatecon or
>> > >> automatic object type transitions are probably not an option here.
>> > >>
>> > >> >
>> > >> > Pardon my ignorance but what creates these files initially? The kernel
>> > >> > in response to some event or systemd or some other userspace
>> > >> > component?
>> > >>
>> > >> Yes AFAIK it is the former (psuedo filesystem similar to procfs, debugfs
>> > >> in that sense). This is also why I don't think that the PR mentioned is
>> > >> tested because cgroup2 fs labeling is done with genfscon and not fsuse
>> > >> trans or fsuse xattr so even if the files would be created by
>> > >> userspace (which I think is not the case) the specified automatic object
>> > >> type transition rule wouldnt work.
>> > >
>> > > Actually, type transitions on cgroupfs should work - I added special
>> > > hooks for kernfs just for that some time ago - see kernel commits
>> > > d0c9c153b4bd6963c8fcccbc0caa12e8fa8d971d..e19dfdc83b60f196e0653d683499f7bc5548128f.
>> >
>> > Interesting. I will try this out. Would this not require at least a
>> > "fsuse trans" statement in policy?
>>
>> No, it should work alongside genfscon. cgroupfs already was special
>> before that as it allowed relabeling despite genfscon being used.
>>
>> >
>> > https://github.com/SELinuxProject/refpolicy/blob/master/policy/modules/kernel/filesystem.te#L89
>> >
>> > Also I am not sure if that support would make much sense on a filesystem
>> > where files are created my the kernel in reaction to some event.
>>
>> It does make sense with named transitions, plus it was needed to make
>> even a simple parent-child inheritance work. Also, I believe some
>> cgroupfs files/directories (I think only directories?) can be created
>> by userspace, too.
>
> We should likely check that the SELinux Notebook and/or other
> documentation reflects this support and which filesystem types are
> supported, both wrt the filesystem types that support both genfscon +
> setxattr and those that support genfscon+setxattr+type_transition
> rules.

I tried this out:

1. yes you can create dirs on cgroup2 fs (but not files)
2. you can have a genfscon "cgroup2" alongside fsuse trans "cgroup2" but
if you do then any genfscon statements you might have like for example
genfscon "cgroup2" "/user.slice" cgroupfile_context) no longer
work. i.e. its pointless to have then both
3. even with a fsuse trans statement I could not make type transitions
work for directories created on cgroup2 fs.

Even if you could create directories on a cgroupfs with a type
transition, and if the files under that directory would inherited the
type of the parent, then that still would not be good enough to address
the memory.pressure file challenge because the point is to allow a
service to write the memory.pressure file but not other files in that
same directory.

-- 
gpg --locate-keys dominick.grift@xxxxxxxxxxx
Key fingerprint = FCD2 3660 5D6B 9D27 7FC6  E0FF DA7E 521F 10F6 4098
Dominick Grift