Re: RFC: thoughts on SELinux namespacing

Paul Moore <paul@xxxxxxxxxxxxxx> · Fri, 13 Oct 2023 15:40:09 -0400

On Fri, Oct 13, 2023 at 12:16 PM Stephen Smalley
<stephen.smalley.work@xxxxxxxxx> wrote:
> On Wed, Oct 11, 2023 at 6:55 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> >
> > Hello all,
> >
> > The SELinux namespace effort has been stuck for several years as we
> > try to solve the problem of managing individual file labels across
> > multiple namespaces. Our only solution thus far, adding namespace
> > specific xattrs to each file, is relatively simple but doesn't scale,
> > and has the potential to become a management problem as a namespace
> > specific identifier needs to be encoded in the xattr name.  Having
> > continued to think about this problem, I believe I have an idea which
> > might allow us to move past this problem and start making progress on
> > SELinux namespaces.  I'd like to get everyone's thoughts on the
> > proposal below ...
> >
> > THE IDEA
> >
> > With the understanding that we only have one persistent label
> > per-file, we need to get a little creative with how we represent a
> > single entity's label in both the parent and child namespaces.  Since
> > our existing approach towards SELinux policy for containers and VMs
> > (sVirt) is to treat the container/VM as a single security domain,
> > let's continue this philosophy to a SELinux namespace: a child
> > namespace will appear as a single SELinux domain/type in the parent
> > namespace, with newly created processes and objects all appearing to
> > have the same type from the parent's point of view.  From the child
> > namespace's perspective, everything will behave as they would
> > normally: processes would run in multiple domains as determined by the
> > namespace's policy, with files labeled according to the labeling rules
> > defined in the namespace's policy (e.g. xattrs, context mounts, etc.).
>
> I don't have any problems with the idea. However, where I got stuck
> with the original selinux namespace patches was not per-namespace
> filesystem security xattrs (which was James' contribution) ...

Thanks for taking a look and reviewing the idea.

The multiple xattr approach is okay-ish if you need *something* to get
past the labeling hurdle so you can work on other aspects of the
implementation, but it is not a viable solution upstream.  The scaling
and management difficulties make it a non-starter in my opinion.

> ... but rather
> the need to support per-namespace in-core inode and superblock
> security blobs.

I didn't get into any implementation details because I was still
wrestling with the design issue of how do we deal with only a single
on-disk label.  In my mind that needed to be solved before we could
spend time thinking about how to implement it in a reasonable manner.

Regardless, you bring up a good point: there are still a number of
implementation challenges with namespacing SELinux.  There is the
issue of supporting multiple labels on kernel entities such as
superblock, inodes, tasks, etc., but I have a hunch that the more
challenging issue is going to be the various LSM/SELinux external
"things" that deal with a single secid/secctx.  We will have to see
how things develop with the LSM stacking effort, but that might end up
helping us in that area, we'll have to see how that goes.

> You'd have to go back to my original posted patch
> series or the older selinuxns branches of my github repo to see my
> attempt at supporting those because they were dropped from the
> working-selinuxns branch due to the ongoing reworking of LSM to handle
> blob allocation by the security framework rather than by the
> individual security modules. I couldn't figure out how to make that
> work safely and efficiently, and AFAICT that still has to be addressed
> for the above idea to work.

Agreed, supporting multiple different views of an entity and mapping
that a namespace in a quick and efficient manner is probably going to
be one of the larger technical challenges.  I won't pretend to have a
answer for this yet, but I do believe we can figure something out.

> > THOUGHTS ON MAKING IT WORK
> >
> > One of the bigger challenges here is how to handle the case of the
> > parent mounting a filesystem for full use by the child namespace
> > (per-file labeling, etc.).  Above I talked about how filesystems would
> > be labeled according to the mounting namespace, so if we want to
> > delegate labeling of the filesystem to a child namespace (without
> > allowing the child to perform the mount) we need to have a mechanism
> > to indicate that the mounting namespace is deferring labeling to a
> > different namespace.  I think the obvious solution to that would be to
> > add two new mount options: "selinuxns_outer=<label>" and
> > "selinuxns_owner=<label>".  The "selinuxns_outer" option would
> > accomplish two things: mark the filesystem for deferred labeling by
> > another namespace, and establish a single label, similar to a context
> > mount, that the mounting namespace would see instead of whatever
> > labeling the filesystem would normally support.  The "selinuxns_owner"
> > option would specify the domain label of the child namespace, granting
> > that domain control over whatever labeling is supported by the
> > filesystem.  In most normal use cases where the child namespace runs
> > with a single domain/type from the parent's perspective I would expect
> > "selinuxns_outer" and "selinuxns_owner" to be set to the same value,
> > although that is not a requirement.
>
> So with my earlier patch set (the one in my older selinuxns branch),
> one could already do the equivalent of selinuxns_outer just using the
> existing context= mount option. This is because it allowed for
> per-namespace superblock security blobs, so you could context mount in
> the parent namespace while still selecting per-file labeling in the
> child. That said, it had the issues I referenced above wrt safety and
> efficiency.

I'm open to other ideas on how to make this work, but I do think it is
important to separate a context mounted filesystem that is shared
across multiple namespaces from a "selinuxns_outer" (or whatever we
want to use) mounted filesystem that appears like a context mount to
the mounting namespace while supporting native labeling for a child
namespace.  If there is a clever way to do that with existing mount
options that's even better.

> For selinuxns_owner, I'm not clear on where/how that would
> be used.

My idea behind the "selinuxns_owner" owner was to provide an easy way
to identify which domains in the mounting/parent namespace would have
the ability to enable native filesystem labeling in their own child
namespace.  It seemed like a reasonable way to simplify the SELinux
namespace API by not needing to specify all of the mounted filesystems
that the child namespace would have labeling control over when the
child namespace was initiated.

> Note that the context you assign to files will quite often
> differ from the context assigned to the processes; hence, if
> selinuxns_owner is meant to be the context of a process, it usually
> won't be the same as selinux_outer.

That came up in some offline discussions too.  When I was kicking
things around in my mind it was easier to reason about things,
especially nested namespaces, if the child namespace was represented
by a single type in the parent namespace.  Unfortunately that
simplification ended up leaking out into my email.  If it makes things
easier to read/understand, I would recommend simply removing that
sentence from my email, it shouldn't affect the design either way.

-- 
paul-moore.com