Re: [REVIEW][PATCH 0/43] Completing the user namespace

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Tue, 10 Apr 2012 18:01:16 -0700

Andrew Lutomirski <luto@xxxxxxx> writes:

> On Tue, Apr 10, 2012 at 4:50 PM, Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>> Andrew Lutomirski <luto@xxxxxxx> writes:
>>
>>> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
>>> <ebiederm@xxxxxxxxxxxx> wrote:
>>>> Andy Lutomirski <luto@xxxxxxx> writes:
>>>>
>>>> My understanding of no_new_privs is that current_cred() including
>>>> the user, the user namespace and the security label will never change,
>>>> with the goal of making the security analysis simple.
>>>
>>> They can change but only if you already have the privilege to change
>>> them yourself and then you do so.  For example, PR_SET_NO_NEW_PRIVS,
>>> setuid, then drop caps is allowed and useful -- it's a race-free way
>>> to make sure that a given uid never executes without no_new_privs set.
>>>  I've implemented this as a pam module.
>>
>> Careful.  There is the security_task_fix_setuid call that will raise
>> your capabilities from cap->effective to cap->permitted if you call
>> setuid(0).  Which in the general case means you can regain all of the
>> root privileges if you only have CAP_SETUID.
>>
>
> That's fine.  If you're running with CAP_SETUID and default
> securebits, then you effectively have all capabilities already and
> don't need to exploit setuid binaries to gain them.  no_new_privs
> doesn't change that.  If you don't want to be able to gain all privs,
> change securebits or drop CAP_SETUID.  seccomp reduces the kernel
> attack surface; no_new_privs reduces the userspace attack surface.
> But see below...
>
>
>>
>>>> I don't recall how seccomp filters are dealt with if you don't have
>>>> no_new_privs enabled.  If seccomp filters installed by root
>>>> are dropped when we change privilege levels it might be worth looking
>>>> at how to keep a seccomp filter installed as long as you stay in
>>>> a user namespace.
>>>>
>>>
>>> They're not dropped.  I think in the current implementation they can't
>>> be dropped at all.
>>
>> Which makes sense.   Is this why you need no_new_privs?  So you can't run
>> seccomp on higher privileged executables and confusing them into keeping
>> privileges when they should not?
>
> Exactly.  seccomp is flexible enough that it's probably possible to
> confuse many setuid executables with it.
>
>>
>>>> The emphasis is a bit different from new_new_privs as the user_namespace
>>>> does not need to guarantee that the lsm will not change security labels,
>>>> etc.
>>>
>>> Hmm.  Is this safe?  For example, if there's a program that LSM policy
>>> grants extra privileges that malfunctions when run inside a user
>>> namespace, can that be used to break out of LSM restrictions?
>>
>> I can't see how it would not be safe.
>>
>> Except for the user namespace pointer the state the LSM and the rest of
>> the kernel sees is the same state the kernel sees.  Aka userspace sees
>> uid 0, the LSM does not.  So I don't know why a LSM would get confused.
>>
>> Beyond that it is a bug for an LSM to grant permissions beyond the
>> core DAC model.  So the worst I can see is an LSM not grokking user
>> namespaces and getting confused and not restricting a process as
>> much as the designer of the LSM would like.
>
> Right.  Suppose you have some program that has extra restrictions
> applied by an LSM.  It executes a helper (e.g. Apache's suidexec
> thing, but I bet there are more examples) which is supposed to be very
> careful not to leak privileges.  The LSM is set to restrict that
> helper less than the parent process.  But that program was written
> before user namespaces existed, and it has a bug (or missing feature)
> that allows its parent to exploit it when run inside an unmapped user
> namespace.  The parent can now escape from the LSM restrictions.
>
> no_new_privs is designed to prevent exactly this issue.

Currently the suid exec will fail because the uid's don't map.

I might switch that around to simply ignoring the change of uid
on suid exec.  I have a patch in my devel tree that plays with
that idea.  However as much as I hit that case once in testing
(I think it was ping).  I don't think running suid executables
is particularly interesting.

Certainly the application program won't care or break, because we are
still bounded by the usaual DAC security.

I wonder a little if the lsm might change labels on exec of a
non suid binary.  That case is more interesting in the unmapped
unprivileged user namespace.

But I just can't seem to care.  The LSM is the line behind which we hide
the crazy.

The only real difference is that I can create namespaces, which are my
process local environment.  Unprivileged users setting up their own
mount namespace will likely allow all kinds of ways to sneak through the
path based protections of apparmor and tomoyo.  As for smack and selinux
shrug.  I know selinux is at least a lot more path based than the
developers like to admit.  I know most of the /proc and /sys checks are
path based, although I don't think they depend on where you mount
things.  I you can somehow trigger a selinux labelling spree with a
different mount namespace selinux will like do some very wrong things.
smack is simple so it will probably work as intended.

Shrug.  There is nothing special here with the unmapped uid case of
user namespaces.  This is all things that have to be dealt with in some
fashion, but I do believe that is for the LSM maintainers to worry
about.

>>> If a user namespace has no visible effect on processes that aren't
>>> descendents of whoever created it, then creating one in no_new_privs
>>> mode should be safe.  On the other hand, it could be somewhat useless.
>>
>> Creating a user namespace will allowing a process access to more kernel
>> facilities.  Aka you can (or at least will be able to) create network
>> namespaces and mount namespaces and the like.  That increases the
>> surface of the kernel an attacker can hit.
>>
>> So in a perfect kernel there are no affects on others.  In a scenario
>> where you are limiting how much of the kernel a user can use I think
>> you would want that.
>>
>> Still given that you aren't doing the very restrictive current_cred()
>> must not change I don't know how it matters, and a bpf based seccomp can
>> pretty easily filter out new user namespace creation.  Shrug.
>
> I'm not worried about that.  I'm more interested in whether
> unprivileged user namespace creation should require nnp and/or whether
> someone might want a mode in which a task is has nnp set but can
> create a user namespace that allows setuid execution inside the
> namespace in spite of the nnp setting.  The latter is probably rather
> complicated to get right and depends on nonexistent filesystem
> features.

Hmm.  If the goals is to avoid confusing lsms, I think when the user
namespaces and no new privs meet it becomes sensible for no new privs
to deny user namespace fiddling.  No clone(CLONE_NEWUSER), no
unshare(CLONE_NEWUSER) no setns(CLONE_NEWUSER).  It becomes trivial
to confuse path based lsms.

If the goal is to avoid confusing privileged executables with seccomp,
I don't think it matters.  The user namespace guarantees you can't get
additional privileges.

As for requiring no new privs for creating a user namespace, ick.  I
think that will just break things.  suid exec is otherwise safe in a
user namespace and it needs to be supported.  If the LSMs have problems
the LSMs need to figure out how to cope.

I do think  no new privs makes sense inside a user namespace exactly
the same way it makes sense if you don't think about user namespaces.

So I expect a really tight security policy use a user_namespace +
seccomp + no new privs.

Eric

--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html