Re: [PATCH 2/4] clone.2: Describe the user namespace

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Thu, 27 Dec 2012 09:20:38 -0800

"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:

> Hi Eric,
>
> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>>
>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>> ---
>>  man2/clone.2 |   39 +++++++++++++++++++++++++++++++++++++++
>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>
>> diff --git a/man2/clone.2 b/man2/clone.2
>> index 0582057..4566677 100644
>> --- a/man2/clone.2
>> +++ b/man2/clone.2
>> @@ -366,6 +366,45 @@ in the same
>>  .BR clone ()
>>  call.
>>  .TP
>> +.BR CLONE_NEWUSER " (since Linux 3.6)"
>
> Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained
> some meaning in 2.6.29.

Looking at it where I have said 3.6 that is wrong.  I meant 3.5.

I think I made the same mistake in one or two other manpages.  Nothing
was merged in 3.6 unfortunately.

My intent was these are the semantics of user namespaces since 3.5,
when my rework/refocusing of them was merged.

Since 3.5 all that has really happened with user namespaces is the
uid/gid to kuid/kgid conversion, permission checks have been relaxed,
and a few bugs have been fixed.

3.8 is huge from a usability standpoint.  3.8 is huge because setns(),
and unshare() are now complete from a namespace perspective, and because
enough permission checks have been relaxed in user namespaces that you
can really start using them.

But semantically from a user namespace perspective nothing really has
changed in 3.8.

>> +If
>> +.B CLONE_NEWUSER
>> +is set, the create the process in a new user namespace.  If this flag is not set, then (as with
>> +.BR fork (2)),
>> +the process is created in the same user namespace as the calling process.
>> +
>> +A user namespace provides an isolated environment for security related identifiers in particular
>> +uids, gids, keys (see
>> +.BR keyctl (2)),
>> +and capabilities.
>> +
>> +When a user namespace is created it initially starts out without a mapping of uids and gids
>> +to the parent user namespace.  The desired mapping of uids to the parent user namespace
>> +may be set by writting into
>> +.IR /proc/[pid]/uid_map.
>> +The desired mapping of gids to the parent user namespace may be set by writinng into
>> +.IR /proc/[pid]/gid_map.
>> +
>> +The first process in a user namespace starts out with a complete set of capabilities with
>> +respect to the new user namespace.
>> +
>> +syscalls that return uids and gids will either return the uid or gid mapped into the current
>> +user namespace if there is a mapping or depending on the context will return either
>> +the overflowuid (default 65534) or the overflowgid (default 65534). See
>> +.IR /proc/sys/kernel/overflowuid, /proc/sys/kernel/overflowgid
>> +
>> +As of Linux 3.8 no priviliges are needed to create a user namespace,
>> +and mount, pid, ipc, net, uts namespaces can be created with just
>> +CAP_SYS_ADMIN privileges in your current user namespace.
>> +
>> +Over the years there have been a lot of features that have been added
>> +to the linux kernel that are only available to privileged users
>> +because of their potential to confuse setuid root applications.  In
>> +general it becomes safe to allow the root user in a user namespace to
>> +use those features because it is impossible while in a user namespace
>> +to gain more privilege than the root user of a user namespace has.
>> +
>> +.TP
>>  .BR CLONE_NEWPID " (since Linux 2.6.24)"
>>  .\" This explanation draws a lot of details from
>>  .\" http://lwn.net/Articles/259217/
>
> I reworked your text somewhat. Could you please review the following:
>
> [[
>        CLONE_NEWUSER
>               (This  flag first became meaningful for clone() in Linux
>               2.6.29, but the implementation of  user  namespaces  was
>               only  completed in Linux 3.8.)

Long rant about 2.6.29 vs 3.8 above.  I think what we need to say is:

                (This  flag first became meaningful for clone() in Linux
                2.6.29, the current semantics were merged present in
                3.5, and user namespaces only really became usable in 3.8.)

>                                               If CLONE_NEWUSER is set,
>               then create the process in a  new  user  namespace.   If
>               this flag is not set, then (as with fork(2)) the process
>               is created in the same user  namespace  as  the  calling
>               process.
>
>               A  user  namespace  provides an isolated environment for
>               security related identifiers, in particular,  user  IDs,
>               group IDs, keys (see keyctl(2)), and capabilities.
>
>               When  a user namespace is created, it starts out without
>               a mapping of user IDs (group IDs)  to  the  parent  user
>               namespace.   The desired mapping of user IDs (group IDs)
>               to the parent user namespace may be set by writing  into
>               /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).

		/proc/[pid]/projid_map deserves a mention.  Not that
                I am a fan of project is or that xfs where the are
                implemented has been converted yet but....

>               The  first process in a user namespace starts out with a
>               complete set of capabilities with  respect  to  the  new
>               user namespace.
>
>               System  calls  that  return  user  IDs  (group IDs) will
>               return either the user ID (group  ID)  mapped  into  the
>               current  user  namespace  if  there is a mapping, or the
>               overflow user ID (group ID); the default value  for  the
>               overflow  user ID (group ID) is 65534.  See the descrip‐
>               tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
>               nel/overflowgid in proc(5).
>
>               Starting  with  Linux  3.8,  no privileges are needed to
>               create a user namespace, and mount, PID, IPC,  net,  and
>               UTS   namespaces   can   be   created   with   just  the
>               CAP_SYS_ADMIN capability in the caller's user namespace.
>
>               Over the years, there have been a lot of  features  that
>               have been added to the Linux kernel that are only avail‐
>               able to privileged users because of their  potential  to
>               confuse  set-user-ID-root  applications.  In general, it
>               becomes safe to allow the root user in a user  namespace
>               to use those features because it is impossible, while in
>               a user namespace, to gain more privilege than  the  root
>               user of a user namespace has.

I don't have any problems with this bit of text.

It occurs to me that what is going on with capabilities and user
namespaces needs to be documented better.  There was a minor bug with
them this release cycle and I realized while the current definition
makes sense and isn't hard to understand in general.  In detail the
interaction of capabilities and user namespaces are hard to describe.

I think capabilities and user namespaces are the work of a future patch
however.

> ]]
>
> Thanks,
>
> Michael
_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers