Re: [REVIEW][PATCH 0/43] Completing the user namespace

Andrew Lutomirski <luto@xxxxxxx> · Tue, 10 Apr 2012 18:00:59 -0700

On Tue, Apr 10, 2012 at 6:01 PM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> Andrew Lutomirski <luto@xxxxxxx> writes:
>
>> On Tue, Apr 10, 2012 at 4:50 PM, Eric W. Biederman
>> <ebiederm@xxxxxxxxxxxx> wrote:
>>> Andrew Lutomirski <luto@xxxxxxx> writes:
>>>
>>>> On Tue, Apr 10, 2012 at 2:59 PM, Eric W. Biederman
>>>> <ebiederm@xxxxxxxxxxxx> wrote:
>>>>> Andy Lutomirski <luto@xxxxxxx> writes:
>>>>>
>>>>> My understanding of no_new_privs is that current_cred() including
>>>>> the user, the user namespace and the security label will never change,
>>>>> with the goal of making the security analysis simple.
>>>>
>>>> They can change but only if you already have the privilege to change
>>>> them yourself and then you do so.  For example, PR_SET_NO_NEW_PRIVS,
>>>> setuid, then drop caps is allowed and useful -- it's a race-free way
>>>> to make sure that a given uid never executes without no_new_privs set.
>>>>  I've implemented this as a pam module.
>>>
>>> Careful.  There is the security_task_fix_setuid call that will raise
>>> your capabilities from cap->effective to cap->permitted if you call
>>> setuid(0).  Which in the general case means you can regain all of the
>>> root privileges if you only have CAP_SETUID.
>>>
>>
>> That's fine.  If you're running with CAP_SETUID and default
>> securebits, then you effectively have all capabilities already and
>> don't need to exploit setuid binaries to gain them.  no_new_privs
>> doesn't change that.  If you don't want to be able to gain all privs,
>> change securebits or drop CAP_SETUID.  seccomp reduces the kernel
>> attack surface; no_new_privs reduces the userspace attack surface.
>> But see below...
>>
>>
>>>
>>>>> I don't recall how seccomp filters are dealt with if you don't have
>>>>> no_new_privs enabled.  If seccomp filters installed by root
>>>>> are dropped when we change privilege levels it might be worth looking
>>>>> at how to keep a seccomp filter installed as long as you stay in
>>>>> a user namespace.
>>>>>
>>>>
>>>> They're not dropped.  I think in the current implementation they can't
>>>> be dropped at all.
>>>
>>> Which makes sense.   Is this why you need no_new_privs?  So you can't run
>>> seccomp on higher privileged executables and confusing them into keeping
>>> privileges when they should not?
>>
>> Exactly.  seccomp is flexible enough that it's probably possible to
>> confuse many setuid executables with it.
>>
>>>
>>>>> The emphasis is a bit different from new_new_privs as the user_namespace
>>>>> does not need to guarantee that the lsm will not change security labels,
>>>>> etc.
>>>>
>>>> Hmm.  Is this safe?  For example, if there's a program that LSM policy
>>>> grants extra privileges that malfunctions when run inside a user
>>>> namespace, can that be used to break out of LSM restrictions?
>>>
>>> I can't see how it would not be safe.
>>>
>>> Except for the user namespace pointer the state the LSM and the rest of
>>> the kernel sees is the same state the kernel sees.  Aka userspace sees
>>> uid 0, the LSM does not.  So I don't know why a LSM would get confused.
>>>
>>> Beyond that it is a bug for an LSM to grant permissions beyond the
>>> core DAC model.  So the worst I can see is an LSM not grokking user
>>> namespaces and getting confused and not restricting a process as
>>> much as the designer of the LSM would like.
>>
>> Right.  Suppose you have some program that has extra restrictions
>> applied by an LSM.  It executes a helper (e.g. Apache's suidexec
>> thing, but I bet there are more examples) which is supposed to be very
>> careful not to leak privileges.  The LSM is set to restrict that
>> helper less than the parent process.  But that program was written
>> before user namespaces existed, and it has a bug (or missing feature)
>> that allows its parent to exploit it when run inside an unmapped user
>> namespace.  The parent can now escape from the LSM restrictions.
>>
>> no_new_privs is designed to prevent exactly this issue.
>
> Currently the suid exec will fail because the uid's don't map.
>
> I might switch that around to simply ignoring the change of uid
> on suid exec.  I have a patch in my devel tree that plays with
> that idea.  However as much as I hit that case once in testing
> (I think it was ping).  I don't think running suid executables
> is particularly interesting.
>
> Certainly the application program won't care or break, because we are
> still bounded by the usaual DAC security.
>
> I wonder a little if the lsm might change labels on exec of a
> non suid binary.  That case is more interesting in the unmapped
> unprivileged user namespace.
>
> But I just can't seem to care.  The LSM is the line behind which we hide
> the crazy.

Sounds like you're reinventing (something very similar to)
no_new_privs.  Why not just require no_new_privs as a prerequisite for
creating a user namespace if you're unprivileged?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html