Re: cgroup2 labeling question

Dominick Grift <dominick.grift@xxxxxxxxxxx> · Mon, 20 Mar 2023 17:37:53 +0100

Stephen Smalley <stephen.smalley.work@xxxxxxxxx> writes:

> On Mon, Mar 20, 2023 at 11:23 AM Dominick Grift
> <dominick.grift@xxxxxxxxxxx> wrote:
>>
>> Stephen Smalley <stephen.smalley.work@xxxxxxxxx> writes:
>>
>> > On Mon, Mar 20, 2023 at 10:46 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
>> >>
>> >> On Mon, Mar 20, 2023 at 3:19 PM Dominick Grift
>> >> <dominick.grift@xxxxxxxxxxx> wrote:
>> >> >
>> >> > Ondrej Mosnacek <omosnace@xxxxxxxxxx> writes:
>> >> >
>> >> > > On Mon, Mar 20, 2023 at 2:59 PM Dominick Grift
>> >> > > <dominick.grift@xxxxxxxxxxx> wrote:
>> >> > >>
>> >> > >> Stephen Smalley <stephen.smalley.work@xxxxxxxxx> writes:
>> >> > >>
>> >> > >> > On Mon, Mar 20, 2023 at 3:25 AM Dominick Grift
>> >> > >> > <dominick.grift@xxxxxxxxxxx> wrote:
>> >> > >> >>
>> >> > >> >>
>> >> > >> >> Hi,
>> >> > >> >>
>> >> > >> >> I was reading this pull request [1] and looked into how I might be able
>> >> > >> >> to implement this in policy but there seem to be some technical
>> >> > >> >> difficulties.
>> >> > >> >>
>> >> > >> >> * I already use getfscon to seperate the systemd user.slice because the
>> >> > >> >>   system manager delegates the user.slice to the user manager.
>> >> > >> >>
>> >> > >> >>   (genfscon "cgroup2" "/user.slice" cgroupfile_context)
>> >> > >> >>
>> >> > >> >>   In the past the proved to be a racy where systemd attempts to
>> >> > >> >>   write before the object has the context associated with the genfscon.
>> >> > >> >
>> >> > >> > I don't understand how this could be racy - genfscon-assigned contexts
>> >> > >> > should be assigned when the dentry is first instantiated via
>> >> > >> > inode_donit_with_dentry and therefore the inode shouldn't be
>> >> > >> > accessible to userspace prior to this initial assignment AFAIK.
>> >> > >> > Possibly I am missing something.
>> >> > >>
>> >> > >> I recall encountering this sporadically, but I admit that it has been a
>> >> > >> while since I supressed it in policy. I might try to reproduce. AFAIK my
>> >> > >> policy is the only policy that actually labels some trees on cgroup2 fs
>> >> > >> with private types currently.
>> >> > >>
>> >> > >> >
>> >> > >> >>   I decided to dontaudit attempts to write to the mislabeled object and
>> >> > >> >>   it *seems* as if systemd retries until it can write it i.e. when the
>> >> > >> >>   object carries the expected label and so that seems to work eventually
>> >> > >> >>   but it looks fragile.
>> >> > >> >>
>> >> > >> >> * The challenge with memory pressure implementation [2] is that these
>> >> > >> >>   "memory.pressure" files end up in random locations under
>> >> > >> >>   "/system.slice" for example:
>> >> > >> >>
>> >> > >> >>   /sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
>> >> > >> >>
>> >> > >> >>   Where in the above systemd-journald.service might be
>> >> > >> >>   templated (systemd-journald@FOO.service). Point is that the path is
>> >> > >> >>   random. genfscon does not support regex and glob. I can't do for example:
>> >> > >> >>
>> >> > >> >>   (genfscon "cgroup2" "/system.slice/.*/memory.pressure"
>> >> > >> >>   cgroupfile_context)
>> >> > >> >>
>> >> > >> >>   Fortunately cgroup2fs supports relabeling but if systemd has to
>> >> > >> >>   manually relabel the cgroup files then I would imagine that this is
>> >> > >> >>   racy as well, and that does not really solve the underlying issue.
>> >> > >> >>
>> >> > >> >>   I am looking for ideas and suggestions
>> >> > >> >
>> >> > >> > Optimally one of two things would happen:
>> >> > >> > 1. The kernel would label the inode correctly when it is first created
>> >> > >> > (e.g. by augmenting genfscon to support more general matching), or
>> >> > >> > 2. The userspace component that creates these files would label them
>> >> > >> > correctly at creation (via setfscreatecon() prior to creation).
>> >> > >>
>> >> > >> Agree but 1. would require regex/glob support for genfscon and 2. these
>> >> > >> files aren't "created" by userspace AFAIK and so setfscreatecon or
>> >> > >> automatic object type transitions are probably not an option here.
>> >> > >>
>> >> > >> >
>> >> > >> > Pardon my ignorance but what creates these files initially? The kernel
>> >> > >> > in response to some event or systemd or some other userspace
>> >> > >> > component?
>> >> > >>
>> >> > >> Yes AFAIK it is the former (psuedo filesystem similar to procfs, debugfs
>> >> > >> in that sense). This is also why I don't think that the PR mentioned is
>> >> > >> tested because cgroup2 fs labeling is done with genfscon and not fsuse
>> >> > >> trans or fsuse xattr so even if the files would be created by
>> >> > >> userspace (which I think is not the case) the specified automatic object
>> >> > >> type transition rule wouldnt work.
>> >> > >
>> >> > > Actually, type transitions on cgroupfs should work - I added special
>> >> > > hooks for kernfs just for that some time ago - see kernel commits
>> >> > > d0c9c153b4bd6963c8fcccbc0caa12e8fa8d971d..e19dfdc83b60f196e0653d683499f7bc5548128f.
>> >> >
>> >> > Interesting. I will try this out. Would this not require at least a
>> >> > "fsuse trans" statement in policy?
>> >>
>> >> No, it should work alongside genfscon. cgroupfs already was special
>> >> before that as it allowed relabeling despite genfscon being used.
>> >>
>> >> >
>> >> > https://github.com/SELinuxProject/refpolicy/blob/master/policy/modules/kernel/filesystem.te#L89
>> >> >
>> >> > Also I am not sure if that support would make much sense on a filesystem
>> >> > where files are created my the kernel in reaction to some event.
>> >>
>> >> It does make sense with named transitions, plus it was needed to make
>> >> even a simple parent-child inheritance work. Also, I believe some
>> >> cgroupfs files/directories (I think only directories?) can be created
>> >> by userspace, too.
>> >
>> > We should likely check that the SELinux Notebook and/or other
>> > documentation reflects this support and which filesystem types are
>> > supported, both wrt the filesystem types that support both genfscon +
>> > setxattr and those that support genfscon+setxattr+type_transition
>> > rules.
>>
>> I tried this out:
>>
>> 1. yes you can create dirs on cgroup2 fs (but not files)
>> 2. you can have a genfscon "cgroup2" alongside fsuse trans "cgroup2" but
>> if you do then any genfscon statements you might have like for example
>> genfscon "cgroup2" "/user.slice" cgroupfile_context) no longer
>> work. i.e. its pointless to have then both
>> 3. even with a fsuse trans statement I could not make type transitions
>> work for directories created on cgroup2 fs.
>>
>> Even if you could create directories on a cgroupfs with a type
>> transition, and if the files under that directory would inherited the
>> type of the parent, then that still would not be good enough to address
>> the memory.pressure file challenge because the point is to allow a
>> service to write the memory.pressure file but not other files in that
>> same directory.
>
> You don't want a fs_use_trans statement in your policy for cgroup2.
> Just genfscon statements. The kernel will still check for
> type_transition rules and apply them to files at creation time without
> having a fs_use_trans, but having a fs_use_trans will override
> genfscon.

Thanks for clarification. to reiterate I could not get type_transition
to work with only genfscon either when creating a directory on cgroup2
fs but maybe I was overlooking something.

-- 
gpg --locate-keys dominick.grift@xxxxxxxxxxx
Key fingerprint = FCD2 3660 5D6B 9D27 7FC6  E0FF DA7E 521F 10F6 4098
Dominick Grift