Re: Login broken with old userspace (was Re: [PATCH v2] selinux: introduce an initial SID for early boot processes)

Paul Moore <paul@xxxxxxxxxxxxxx> · Tue, 1 Aug 2023 15:02:35 -0400

On Tue, Aug 1, 2023 at 9:24 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> On Fri, Jul 28, 2023 at 5:12 PM Paul Moore <paul@xxxxxxxxxxxxxx> wrote:
> >
> > On Fri, Jul 28, 2023 at 9:24 AM Christian Göttsche
> > <cgzones@xxxxxxxxxxxxxx> wrote:
> > >
> > > On Fri, 28 Jul 2023 at 15:14, Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> > > >
> > > > On Fri, Jul 28, 2023 at 1:52 PM Stephen Smalley
> > > > <stephen.smalley.work@xxxxxxxxx> wrote:
> > > > >
> > > > > On Fri, Jul 28, 2023 at 7:36 AM Ondrej Mosnacek <omosnace@xxxxxxxxxx> wrote:
> > > > > >
> > > > > > On Fri, Jul 28, 2023 at 4:12 AM Michael Ellerman <mpe@xxxxxxxxxxxxxx> wrote:
> > > > > > >
> > > > > > > Ondrej Mosnacek <omosnace@xxxxxxxxxx> writes:
> > > > > > > > Currently, SELinux doesn't allow distinguishing between kernel threads
> > > > > > > > and userspace processes that are started before the policy is first
> > > > > > > > loaded - both get the label corresponding to the kernel SID. The only
> > > > > > > > way a process that persists from early boot can get a meaningful label
> > > > > > > > is by doing a voluntary dyntransition or re-executing itself.
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > This commit breaks login for me when booting linux-next kernels with old
> > > > > > > userspace, specifically Ubuntu 16.04 on ppc64le. 18.04 is OK.
> > > > > > >
> > > > > > > The symptom is that login never accepts the root password, it just
> > > > > > > always says "Login incorrect".
> > > > > > >
> > > > > > > Bisect points to this commit.
> > > > > > >
> > > > > > > Reverting this commit on top of next-20230726, fixes the problem
> > > > > > > (ie. login works again).
> > > > > > >
> > > > > > > Booting with selinux=0 also fixes the problem.
> > > > > > >
> > > > > > > Is this expected? The change log below suggests backward compatibility
> > > > > > > was considered, is 16.04 just too old?
> > > > > >
> > > > > > Hi Michael,
> > > > > >
> > > > > > I can reproduce it on Fedora 38 when I boot with SELINUX=disabled in
> > > > > > /etc/selinux/config (+ a kernel including that commit), so it likely
> > > > > > isn't caused by the userspace being old. Can you check what you have
> > > > > > in /etc/selinux/config (or if it exists at all)?
> > > > > >
> > > > > > We have deprecated and removed the "runtime disable" functionality in
> > > > > > SELinux recently [1], which was used to implement "disabling" SELinux
> > > > > > via the /etc/selinux/config file, so now the situation (selinux=0 +
> > > > > > SELINUX=disabled in /etc/selinux/config) leads to a state where
> > > > > > SELinux is enabled, but no policy is loaded (and no enforcement is
> > > > > > done). Such a state mostly behaves as if SElinux was truly disabled
> > > > > > (via kernel command line), but there are some subtle differences and I
> > > > > > believe we don't officially support it (Paul might clarify). With
> > > > > > latest kernels it is recommended to either disable SELinux via the
> > > > > > kernel command line (or Kconfig[2]) or to boot it in Enforcing or
> > > > > > Permissive mode with a valid/usable policy installed.
> > > > > >
> > > > > > So I wonder if Ubuntu ships by default with the bad configuration or
> > > > > > if it's just a result of using the custom-built linux-next kernel (or
> > > > > > some changes on your part). If Ubuntu's stock kernel is configured to
> > > > > > boot with SELinux enabled by default, they should also by default ship
> > > > > > a usable policy and SELINUX=permissive/enforcing in
> > > > > > /etc/selinux/config (or configure the kernel[2] or bootloader to boot
> > > > > > with SELinux disabled by default). (Although if they ship a pre-[1]
> > > > > > kernel, they may continue to rely on the runtime disable
> > > > > > functionality, but it means people will have to be careful when
> > > > > > booting newer or custom kernels.)
> > > > > >
> > > > > > That said, I'd like to get to the bottom of why the commit causes the
> > > > > > login to fail and fix it somehow. I presume something in PAM chokes on
> > > > > > the fact that userspace tasks now have "init" instead of "kernel" as
> > > > > > the pre-policy-load security context, but so far I haven't been able
> > > > > > to pinpoint the problem. I'll keep digging...
> > > > > >
> > > > > > [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f22f9aaf6c3d92ebd5ad9e67acc03afebaaeb289
> > > > > > [2] via CONFIG_LSM (or CONFIG_SECURITY_SELINUX_BOOTPARAM_VALUE on older kernels)
> > > > >
> > > > > Prior to selinux userspace commit
> > > > > 685f4aeeadc0b60f3770404d4f149610d656e3c8 ("libselinux:
> > > > > is_selinux_enabled(): drop no-policy-loaded test.") libselinux was
> > > > > checking the result of reading /proc/self/attr/current to see if it
> > > > > returned the "kernel" string as a means of detecting a system with
> > > > > SELinux enabled but no policy loaded, and treated that as if SELinux
> > > > > were disabled. Hence, this does break old userspace. Not sure though
> > > > > why you'd see the same behavior with modern libselinux.
> > > >
> > > > Hm... now I tried booting the stock Fedora kernel (without the early
> > > > boot initial SID commit) and I got the same failure to login as with
> > > > the new kernel. So if Ubuntu 16.04 ships with pre-685f4aeeadc0
> > > > libselinux (quite possible), then it seems that the scenario with
> > > > terminal login + SELinux enabled + policy not loaded only works with
> > > > pre-685f4aeeadc0 libselinux and pre-5b0eea835d4e kernel, the other
> > > > combinations are broken. With pre-685f4aeeadc0 libselinux +
> > > > post-5b0eea835d4e kernel it is expected as you say (and probably
> > > > inevitable barring some hack on the kernel side), but it's not clear
> > > > why also only updating libselinux seems to break it... /sys/fs/selinux
> > > > is not mounted in my scenario, so there must be something else coming
> > > > into play.
> > > >
> > > >
> > > > --
> > > > Ondrej Mosnacek
> > > > Senior Software Engineer, Linux Security - SELinux kernel
> > > > Red Hat, Inc.
> > > >
> > >
> > > Completely untested:
> > >
> > > diff --git a/security/selinux/ss/services.c b/security/selinux/ss/services.c
> > > index 2c5be06fbada..1ed275bd4551 100644
> > > --- a/security/selinux/ss/services.c
> > > +++ b/security/selinux/ss/services.c
> > > @@ -1322,8 +1322,19 @@ static int security_sid_to_context_core(u32
> > > sid, char **scontext,
> > >         if (!selinux_initialized()) {
> > >                 if (sid <= SECINITSID_NUM) {
> > >                         char *scontextp;
> > > -                       const char *s = initial_sid_to_string[sid];
> > > +                       const char *s;
> > >
> > > +                       /*
> > > +                        * Hide the context split of kernel threads and
> > > +                        * userspace threads from userspace before the first
> > > +                        * policy is loaded.  Userspace, e.g. libselinux prior
> > > +                        * to v2.6 or systemd, depends on the context being
> > > +                        * "kernel".
> > > +                        */
> > > +                       if (sid == SECINITSID_INIT)
> > > +                               sid = SECINITSID_KERNEL;
> > > +
> > > +                       s = initial_sid_to_string[sid];
> > >                         if (!s)
> > >                                 return -EINVAL;
> > >                         *scontext_len = strlen(s) + 1;
> >
> > I think I'd rather see something that does the following:
> >
> > 1. Convert all direct access of @initial_sid_to_string to calls to
> > security_get_initial_sid_context().  I think we can leave all the
> > stuff under scripts/ as-is, but I didn't think about it that hard, so
> > some additional thought would be required here.
>
> What should we then do with the reverse translation in
> security_context_to_sid_core()? It seems it is currently possible for
> a process to e.g. change its SID to another initial SID before the
> policy is loaded - would we let it to set itself to INIT and yet still
> return back KERNEL afterwards?

Yeah, I wasn't thinking of the reverse translation.  While the sid to
context string mapping is definitely flexible, I really don't like the
idea of changing the sid assigned to an entity, even if there isn't a
policy loaded (yet).

> > 2. Modify security_get_initial_sid_context() to so something similar
> > to what Christian proposed, e.g. translate INIT to KERNEL, but do so
> > only when POLICYDB_CAP_USERSPACE_INITIAL_CONTEXT is not enabled.  I
> > believe this should cover both the uninitialized and old policy case.
>
> You don't know whether the policycap is enabled or not until the
> policy is loaded and at that point it doesn't matter because then you
> already have a full context assigned to the SID.

My perspective is that there are really only two states we care about:
policy loaded with the INITIAL_CONTEXT policycap, and everything else
(including no policy loaded).

> OTOH, I don't know if we have another choice given the "no regressions" rule...

Here's the thing, we're at -rc4 right now and I'm a little concerned
that I haven't seen a fix on-list for this.  If we don't at least have
some sort of design by the end of this week, with a patch early next
week, I'm going to have to revert the patch in selinux/next.

You don't have to like the silly little design above, but we need
*something* to fix this.

-- 
paul-moore.com