Re: [REVIEW][PATCH 0/43] Completing the user namespace

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Tue, 10 Apr 2012 14:59:52 -0700

Andy Lutomirski <luto@xxxxxxx> writes:

> On 04/07/2012 10:10 PM, Eric W. Biederman wrote:
>> 
>> This is a course correction for the user namespace, so that we can reach
>> an inexpensive, maintainable, and reasonably complete implementation.
>> 
>> If anyone can think of a reason why the user namespace should not
>> evolve in the direction taken in this patchset please let me know.
>> 
>> There is not an obvious maintainer for the scope of what this patchset
>> covers so I intend to host this tree myself and to place it in
>> linux-next after this round of review.
>> 
>> Highlights.
>> - The kernel will now fail to build if you attempt to compile in
>>   code whose permission checks have not been updated to be user
>>   namespace safe.
>> 
>> - All uids from child user namespaces are mapped into the initial user
>>   namespace before they are processed.  Removing the need to add
>>   an additional check to see if the user namespace of the compared
>>   uids remains the same.
>
> [...]
>
> I haven't read enough of the details to figure out how the uid mapping
> works (do all the child namespace uids map to the same parent uid?), so
> I may be missing some details here.

You seem to be missing a detail or two.

What you want to look at are the functions make_kuid and from_kuid
in kernel/user_namespace.c  You might look at the patches that talk
about uidgid.h and introducing a mapping layer.

The implementation creates an incomplete but 1-1 mapping to the uids in
the initial user namespace.  Which means except for the change in
datatype (sigh) the existing permission checks don't need to be changed.

I change the data type from uid_t to kuid_t for everything internal
and make them assignment incompatible to force the use of
make_kuid and from_kuid at the boundaries of user space, and of the
filesystems and unfortunately that means uid comparisons themselves
must change a little.  aka uid_eq. instead of == .  That grand
search and replace is probably the scariest bit of this patchset.

> As a bit of background, the no_new_privs mode introduced in the big
> seccomp patchset will add a flag that any task can set to prevent it or
> any of its children from gaining privileges by using execve.
>
> How should this interact with pid namespaces?

I assume you mean uid namespaces?  I don't expect pid namespaces will
have any effect.

> As a first pass, I
> imagine that the main PR_SET_NO_NEW_PRIVS(1) mode will prevent setuid
> from working inside uid namespaces as well, but there may be interest in
> weaker variants that allow setuid inside namespaces.

> Any thoughts?

It looks like a big strength of seccomp is reducing the attack surface
of the kernel.  The user namespace will actually increase that attack
surface by making much more of the functionality available.  So on that
level they are very different mechanisms.

My understanding of no_new_privs is that current_cred() including
the user, the user namespace and the security label will never change,
with the goal of making the security analysis simple.

I don't recall how seccomp filters are dealt with if you don't have
no_new_privs enabled.  If seccomp filters installed by root
are dropped when we change privilege levels it might be worth looking
at how to keep a seccomp filter installed as long as you stay in
a user namespace.

There are essentially two modes you can use the user namespace in:
with mappings setup (a privileged operation) and with no mappings.

With no mappings you can not create a new user namespace or change or
uid or gids, and suid exec fails (or possibly ignores the uid/gid change
but I am starting with suid exec fails).  Making user namespaces similar
to no_new_privs.

The emphasis is a bit different from new_new_privs as the user_namespace
does not need to guarantee that the lsm will not change security labels,
etc.

At a basic level of interaction I expect no_new_privs will need to fail
any change of the user namespace.  As changing the user namespace
changes current_cred(), and fundamentally allows more things to happen.

Overall I think the two mechanisms are complementary.  I actually don't
expect any fundamental conflicts in the code between user namespaces and
no_new_privs I do expect conflicts in the patches, because some of the
same code paths will be touched.  Both approaches change exec and the
user namespace implementation lightly touches every every permission
check in the kernel.

Where seccomp focuses on making things secure the user namespace focuses
on making things possible that were not before.  In particular the user
namespace makes it easily possible and generally safe to allow suid exec,
creation of namespaces and allowing capable calls with respect to
namespaces we have created.

With both mechanisms in play I would expect the implementation of
no_new_privs to be able focus on limiting the attack surface and not
have to worry about allowing more kernel functionality to be used.
While the user namespaces can then focus on increasing the functionality
exported to unprivileged users.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html