Re: cgroup2 labeling question

Ondrej Mosnacek <omosnace@xxxxxxxxxx> · Mon, 20 Mar 2023 15:46:12 +0100

On Mon, Mar 20, 2023 at 3:19 PM Dominick Grift
<dominick.grift@xxxxxxxxxxx> wrote:
>
> Ondrej Mosnacek <omosnace@xxxxxxxxxx> writes:
>
> > On Mon, Mar 20, 2023 at 2:59 PM Dominick Grift
> > <dominick.grift@xxxxxxxxxxx> wrote:
> >>
> >> Stephen Smalley <stephen.smalley.work@xxxxxxxxx> writes:
> >>
> >> > On Mon, Mar 20, 2023 at 3:25 AM Dominick Grift
> >> > <dominick.grift@xxxxxxxxxxx> wrote:
> >> >>
> >> >>
> >> >> Hi,
> >> >>
> >> >> I was reading this pull request [1] and looked into how I might be able
> >> >> to implement this in policy but there seem to be some technical
> >> >> difficulties.
> >> >>
> >> >> * I already use getfscon to seperate the systemd user.slice because the
> >> >>   system manager delegates the user.slice to the user manager.
> >> >>
> >> >>   (genfscon "cgroup2" "/user.slice" cgroupfile_context)
> >> >>
> >> >>   In the past the proved to be a racy where systemd attempts to
> >> >>   write before the object has the context associated with the genfscon.
> >> >
> >> > I don't understand how this could be racy - genfscon-assigned contexts
> >> > should be assigned when the dentry is first instantiated via
> >> > inode_donit_with_dentry and therefore the inode shouldn't be
> >> > accessible to userspace prior to this initial assignment AFAIK.
> >> > Possibly I am missing something.
> >>
> >> I recall encountering this sporadically, but I admit that it has been a
> >> while since I supressed it in policy. I might try to reproduce. AFAIK my
> >> policy is the only policy that actually labels some trees on cgroup2 fs
> >> with private types currently.
> >>
> >> >
> >> >>   I decided to dontaudit attempts to write to the mislabeled object and
> >> >>   it *seems* as if systemd retries until it can write it i.e. when the
> >> >>   object carries the expected label and so that seems to work eventually
> >> >>   but it looks fragile.
> >> >>
> >> >> * The challenge with memory pressure implementation [2] is that these
> >> >>   "memory.pressure" files end up in random locations under
> >> >>   "/system.slice" for example:
> >> >>
> >> >>   /sys/fs/cgroup/system.slice/systemd-journald.service/memory.pressure
> >> >>
> >> >>   Where in the above systemd-journald.service might be
> >> >>   templated (systemd-journald@FOO.service). Point is that the path is
> >> >>   random. genfscon does not support regex and glob. I can't do for example:
> >> >>
> >> >>   (genfscon "cgroup2" "/system.slice/.*/memory.pressure"
> >> >>   cgroupfile_context)
> >> >>
> >> >>   Fortunately cgroup2fs supports relabeling but if systemd has to
> >> >>   manually relabel the cgroup files then I would imagine that this is
> >> >>   racy as well, and that does not really solve the underlying issue.
> >> >>
> >> >>   I am looking for ideas and suggestions
> >> >
> >> > Optimally one of two things would happen:
> >> > 1. The kernel would label the inode correctly when it is first created
> >> > (e.g. by augmenting genfscon to support more general matching), or
> >> > 2. The userspace component that creates these files would label them
> >> > correctly at creation (via setfscreatecon() prior to creation).
> >>
> >> Agree but 1. would require regex/glob support for genfscon and 2. these
> >> files aren't "created" by userspace AFAIK and so setfscreatecon or
> >> automatic object type transitions are probably not an option here.
> >>
> >> >
> >> > Pardon my ignorance but what creates these files initially? The kernel
> >> > in response to some event or systemd or some other userspace
> >> > component?
> >>
> >> Yes AFAIK it is the former (psuedo filesystem similar to procfs, debugfs
> >> in that sense). This is also why I don't think that the PR mentioned is
> >> tested because cgroup2 fs labeling is done with genfscon and not fsuse
> >> trans or fsuse xattr so even if the files would be created by
> >> userspace (which I think is not the case) the specified automatic object
> >> type transition rule wouldnt work.
> >
> > Actually, type transitions on cgroupfs should work - I added special
> > hooks for kernfs just for that some time ago - see kernel commits
> > d0c9c153b4bd6963c8fcccbc0caa12e8fa8d971d..e19dfdc83b60f196e0653d683499f7bc5548128f.
>
> Interesting. I will try this out. Would this not require at least a
> "fsuse trans" statement in policy?

No, it should work alongside genfscon. cgroupfs already was special
before that as it allowed relabeling despite genfscon being used.

>
> https://github.com/SELinuxProject/refpolicy/blob/master/policy/modules/kernel/filesystem.te#L89
>
> Also I am not sure if that support would make much sense on a filesystem
> where files are created my the kernel in reaction to some event.

It does make sense with named transitions, plus it was needed to make
even a simple parent-child inheritance work. Also, I believe some
cgroupfs files/directories (I think only directories?) can be created
by userspace, too.

-- 
Ondrej Mosnacek
Senior Software Engineer, Linux Security - SELinux kernel
Red Hat, Inc.