Re: Race condition during policy load in kernel

Stephen Smalley <stephen.smalley.work@xxxxxxxxx> · Wed, 29 Apr 2020 08:32:33 -0400

On Tue, Apr 28, 2020 at 4:38 PM Daniel Burgener
<dburgener@xxxxxxxxxxxxxxxxxxx> wrote:
>
> On 4/28/20 3:34 PM, Ondrej Mosnacek wrote:
> > On Tue, Apr 28, 2020 at 8:54 PM Stephen Smalley
> > <stephen.smalley.work@xxxxxxxxx> wrote:
> >> On Mon, Apr 27, 2020 at 4:40 PM Daniel Burgener
> >> <dburgener@xxxxxxxxxxxxxxxxxxx> wrote:
> >>> Hello all,
> >>>
> >>> We've noticed a few instances of strange failures in userspace object
> >>> managers happening immediately after a policy load, that we believe
> >>> we've traced to a race condition in the kernel, and wanted to get your
> >>> thoughts on our diagnosis and the appropriate fix before creating a
> >>> patch in case we've missed something.
> >>>
> >>> The issue is that userspace object managers rely on /sys/fs/selinux to
> >>> determine the mapping of object class and permission strings to numeric
> >>> identifiers, however immediately after a policy load,
> >>> /sys/fs/selinux/{booleans,class,policy_capabilities}, are deleted and
> >>> recreated.  This leaves a window where the object classes (etc) appear
> >>> to be not defined, even if they are defined in both the old and new
> >>> policies.
> >>>
> >>> We have observed this with both dbus and systemd userspace object
> >>> managers, and it can be reproduced straightforwardly by running the
> >>> following (borrowed from bug linked below):
> >>>
> >>> # (while true; do cat /sys/fs/selinux/class/service/index >/dev/null;
> >>> done) &
> >>> # while true; do sudo load_policy; echo -n .; sleep 0.1;done
> >>>
> >>> Periodically, we will get "No such file or directory" messages printed
> >>> to stderr.  In the event of a userspace object manager using libselinux
> >>> to check a userspace permission, that will result in a USER_AVC message
> >>> indicating an unknown object class, and in the event that
> >>> --handle-unknown is set to "deny", it will deny the access.
> >>>
> >>> It seems to me as though some sort of locking should occur in the
> >>> selinuxfs to prevent access to the files being deleted and recreated
> >>> during the policy load, so that userspace programs relying on them (in
> >>> particular userspace object managers doing class lookups) get a
> >>> consistent view of the classes, perms, booleans and capabilities in the
> >>> loaded policy.
> >>>
> >>> This seems to be related to
> >>> https://github.com/SELinuxProject/selinux-kernel/issues/42 but I believe
> >>> it is a different case.  The deadlock in that bug seems to be related to
> >>> the underlying filesystem functions, specifically around directory
> >>> deletion while this is an issue the selinuxfs logic specifically. The
> >>> above linked issue appears to have been fixed in recent upstream
> >>> kernels, per the bug, but I have verified the issue I am discussing here
> >>> in 5.7.0 rc3.
> >>>
> >>> It seems to me as though from the perspective of userspace that all of
> >>> sel_make_policy_nodes (or at least all of each of its component
> >>> functions) should be atomic.  There was some discussion in a previous
> >>> thread
> >>> (https://lore.kernel.org/selinux/20181002155810.GP32577@xxxxxxxxxxxxxxxxxx/)
> >>> around a significant refactor of policy loading in general.  It appears
> >>> as though the direct issue there of access during the deletion has been
> >>> resolved (commit d4f4de5e5ef8efde85febb6876cd3c8ab1631999), although the
> >>> complete suggested fix of refactoring policy_load into two parts has not
> >>> been done.  Would that refactor be the right approach to the problem I
> >>> am trying to solve?  Would a patch for adding locking around the
> >>> selinuxfs delete/recreate operation be considered? That wouldn't address
> >>> all the concerns, (namely the potential to access a view of the policy
> >>> that doesn't match the currently loaded policy and error recovery in the
> >>> case that sel_make_nodes fails), but would improve the reliability of
> >>> existing userspace object managers
> >>>
> >>> I'm happy to create and submit a patch, but I wanted to get the
> >>> communities thought on the problem and correct approach first.
> >> I think the best approach is the one sketched by viro but I recognize
> >> that's a big lift.
> >> Willing to help but not entirely clear on the interactions with the
> >> dcache for atomically swapping
> >> the new trees into place.
> >>
> >> I suspect this is just being exposed now due to more recent changes in
> >> userspace to try to fully support
> >> changes in class/permission values at runtime.  Previously userspace
> >> object managers tended to only
> >> map them during initialization or on first use and then would just
> >> keep using the cached values, whereas
> >> now they try to map them lazily and flush their caches on a reload
> >> notification.  Some of those changes
> >> were in libselinux and others in the userspace object managers (e.g.
> >> systemd, dbus-broker or dbusd).
> >> Not sure exactly what versions of libselinux, systemd, and
> >> dbusd/dbus-broker you are using.
> >>
> >> selinuxfs itself does take a mutex around the entire policy load
> >> including the delete/re-create sequence but
> >> that only serializes with other policy operations (load, read,
> >> booleans), not with dcache walk/lookup.
> >>
> >> cc'ing Ondrej since he attempted to fix the earlier selinuxfs bug and
> >> may have an opinion on the best way
> >> forward for this issue.
> > Well, I attempted a few times and each time failed miserably :)
> > Thankfully, hitting the bug via another fs eventually forced viro to
> > fix it (in a way I could probably never come up with myself). However,
> > I remember trying to originally fix the bug by means of making the
> > swapover atomic, but later I realized that these two are actually
> > independent issues. After that I didn't get back to atomizing the
> > swapover, but IIRC I had the impression that it might not be all that
> > difficult... (at least if I ignore possible failures during the new
> > dentry tree creation for now - it would still be unsafe, but it would
> > be a start). But knowing VFS, I bet when I actually try it will prove
> > to be much more tricky ;)
> >
> > I'll try to find some time to sit down to it again, but at the moment
> > I'm juggling a bunch of higher priority stuff so it might be a while
> > before I get to it.
> >
> The weird thing about atomizing the swapover in isolation is that it's
> unclear what action to take on failure.  The existing code looks like it
> would leave a broken version of the selinuxfs laying around.  Completing
> the swapover even on failure would give us an atomic version of the
> current situation, but feels extra weird. If ignoring the failure case
> as Ondrej suggests would be acceptable in a patch, that sounds like the
> quickest way to addressing at least part of the problem (the refactor of
> security_load_policy by itself doesn't get us much by itself, since we
> can't roll back the selinuxfs after we've started in the current state.)

I don't entirely understand what you mean by the above, since viro's sketch
of a new policy load sequence does permit handling the error case in the middle
of the new selinuxfs directory/file creation by just throwing away the
new entries
altogether and leaving the old policydb/sidtab + old selinuxfs in
place.  That said
I'm not saying you have to do the whole thing at once.

> I'd be happy to start looking at either half, although I'd prefer to
> start with atomizing the tree swapover to solve my immediate problem if
> that would have a chance at getting merged by itself given the issues
> around the failure case.

Certainly willing to look at incremental improvements here.