Re: [REVIEW][PATCH 0/43] Completing the user namespace

Andrew Lutomirski <luto@xxxxxxx> · Tue, 10 Apr 2012 16:56:54 -0700

On Tue, Apr 10, 2012 at 4:50 PM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> Andrew Lutomirski <luto@xxxxxxx> writes:
>
>> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
>> <ebiederm@xxxxxxxxxxxx> wrote:
>>> Andy Lutomirski <luto@xxxxxxx> writes:
>>>
>>> My understanding of no_new_privs is that current_cred() including
>>> the user, the user namespace and the security label will never change,
>>> with the goal of making the security analysis simple.
>>
>> They can change but only if you already have the privilege to change
>> them yourself and then you do so.  For example, PR_SET_NO_NEW_PRIVS,
>> setuid, then drop caps is allowed and useful -- it's a race-free way
>> to make sure that a given uid never executes without no_new_privs set.
>>  I've implemented this as a pam module.
>
> Careful.  There is the security_task_fix_setuid call that will raise
> your capabilities from cap->effective to cap->permitted if you call
> setuid(0).  Which in the general case means you can regain all of the
> root privileges if you only have CAP_SETUID.
>

That's fine.  If you're running with CAP_SETUID and default
securebits, then you effectively have all capabilities already and
don't need to exploit setuid binaries to gain them.  no_new_privs
doesn't change that.  If you don't want to be able to gain all privs,
change securebits or drop CAP_SETUID.  seccomp reduces the kernel
attack surface; no_new_privs reduces the userspace attack surface.
But see below...

>
>>> I don't recall how seccomp filters are dealt with if you don't have
>>> no_new_privs enabled.  If seccomp filters installed by root
>>> are dropped when we change privilege levels it might be worth looking
>>> at how to keep a seccomp filter installed as long as you stay in
>>> a user namespace.
>>>
>>
>> They're not dropped.  I think in the current implementation they can't
>> be dropped at all.
>
> Which makes sense.   Is this why you need no_new_privs?  So you can't run
> seccomp on higher privileged executables and confusing them into keeping
> privileges when they should not?

Exactly.  seccomp is flexible enough that it's probably possible to
confuse many setuid executables with it.

>
>>> The emphasis is a bit different from new_new_privs as the user_namespace
>>> does not need to guarantee that the lsm will not change security labels,
>>> etc.
>>
>> Hmm.  Is this safe?  For example, if there's a program that LSM policy
>> grants extra privileges that malfunctions when run inside a user
>> namespace, can that be used to break out of LSM restrictions?
>
> I can't see how it would not be safe.
>
> Except for the user namespace pointer the state the LSM and the rest of
> the kernel sees is the same state the kernel sees.  Aka userspace sees
> uid 0, the LSM does not.  So I don't know why a LSM would get confused.
>
> Beyond that it is a bug for an LSM to grant permissions beyond the
> core DAC model.  So the worst I can see is an LSM not grokking user
> namespaces and getting confused and not restricting a process as
> much as the designer of the LSM would like.

Right.  Suppose you have some program that has extra restrictions
applied by an LSM.  It executes a helper (e.g. Apache's suidexec
thing, but I bet there are more examples) which is supposed to be very
careful not to leak privileges.  The LSM is set to restrict that
helper less than the parent process.  But that program was written
before user namespaces existed, and it has a bug (or missing feature)
that allows its parent to exploit it when run inside an unmapped user
namespace.  The parent can now escape from the LSM restrictions.

no_new_privs is designed to prevent exactly this issue.

>>
>> If a user namespace has no visible effect on processes that aren't
>> descendents of whoever created it, then creating one in no_new_privs
>> mode should be safe.  On the other hand, it could be somewhat useless.
>
> Creating a user namespace will allowing a process access to more kernel
> facilities.  Aka you can (or at least will be able to) create network
> namespaces and mount namespaces and the like.  That increases the
> surface of the kernel an attacker can hit.
>
> So in a perfect kernel there are no affects on others.  In a scenario
> where you are limiting how much of the kernel a user can use I think
> you would want that.
>
> Still given that you aren't doing the very restrictive current_cred()
> must not change I don't know how it matters, and a bpf based seccomp can
> pretty easily filter out new user namespace creation.  Shrug.

I'm not worried about that.  I'm more interested in whether
unprivileged user namespace creation should require nnp and/or whether
someone might want a mode in which a task is has nnp set but can
create a user namespace that allows setuid execution inside the
namespace in spite of the nnp setting.  The latter is probably rather
complicated to get right and depends on nonexistent filesystem
features.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html