Re: [PATCH 2/4] clone.2: Describe the user namespace

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Tue, 01 Jan 2013 01:45:36 -0800

"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:

> Hi Eric,
>
> On Thu, Dec 27, 2012 at 6:20 PM, Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes:
>>
>>> Hi Eric,
>>>
>>> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman
>>> <ebiederm@xxxxxxxxxxxx> wrote:
>>>>
>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>>>> ---
>>>>  man2/clone.2 |   39 +++++++++++++++++++++++++++++++++++++++
>>>>  1 files changed, 39 insertions(+), 0 deletions(-)
>>>>
>>>> diff --git a/man2/clone.2 b/man2/clone.2
>>>> index 0582057..4566677 100644
>>>> --- a/man2/clone.2
>>>> +++ b/man2/clone.2
>>>> @@ -366,6 +366,45 @@ in the same
>>>>  .BR clone ()
>>>>  call.
>>>>  .TP
>>>> +.BR CLONE_NEWUSER " (since Linux 3.6)"
>>>
>>> Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained
>>> some meaning in 2.6.29.
>>
>> Looking at it where I have said 3.6 that is wrong.  I meant 3.5.
>
> Okay.
>
>> I think I made the same mistake in one or two other manpages.  Nothing
>> was merged in 3.6 unfortunately.
>
> I think the other cases have been fixed by now.
>
>> My intent was these are the semantics of user namespaces since 3.5,
>> when my rework/refocusing of them was merged.
>>
>> Since 3.5 all that has really happened with user namespaces is the
>> uid/gid to kuid/kgid conversion, permission checks have been relaxed,
>> and a few bugs have been fixed.
>>
>> 3.8 is huge from a usability standpoint.  3.8 is huge because setns(),
>> and unshare() are now complete from a namespace perspective, and because
>> enough permission checks have been relaxed in user namespaces that you
>> can really start using them.
>>
>> But semantically from a user namespace perspective nothing really has
>> changed in 3.8.
>>
> [...]
>
>>> I reworked your text somewhat. Could you please review the following:
>>>
>>> [[
>>>        CLONE_NEWUSER
>>>               (This  flag first became meaningful for clone() in Linux
>>>               2.6.29, but the implementation of  user  namespaces  was
>>>               only  completed in Linux 3.8.)
>>
>> Long rant about 2.6.29 vs 3.8 above.  I think what we need to say is:
>>
>>                 (This  flag first became meaningful for clone() in Linux
>>                 2.6.29, the current semantics were merged present in
>>                 3.5, and user namespaces only really became usable in 3.8.)
>
> Yup. I've done something like that now.
>
>>>                                               If CLONE_NEWUSER is set,
>>>               then create the process in a  new  user  namespace.   If
>>>               this flag is not set, then (as with fork(2)) the process
>>>               is created in the same user  namespace  as  the  calling
>>>               process.
>>>
>>>               A  user  namespace  provides an isolated environment for
>>>               security related identifiers, in particular,  user  IDs,
>>>               group IDs, keys (see keyctl(2)), and capabilities.
>>>
>>>               When  a user namespace is created, it starts out without
>>>               a mapping of user IDs (group IDs)  to  the  parent  user
>>>               namespace.   The desired mapping of user IDs (group IDs)
>>>               to the parent user namespace may be set by writing  into
>>>               /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).
>>
>>                 /proc/[pid]/projid_map deserves a mention.  Not that
>>                 I am a fan of project is or that xfs where the are
>>                 implemented has been converted yet but....
>
> Would you be able to send a patch documenting this in proc(5)?

Sure.  I don't know why I didn't mention projid in my earlier patch.
Same story fewer permission checks.  Silly me.

>>>               The  first process in a user namespace starts out with a
>>>               complete set of capabilities with  respect  to  the  new
>>>               user namespace.
>>>
>>>               System  calls  that  return  user  IDs  (group IDs) will
>>>               return either the user ID (group  ID)  mapped  into  the
>>>               current  user  namespace  if  there is a mapping, or the
>>>               overflow user ID (group ID); the default value  for  the
>>>               overflow  user ID (group ID) is 65534.  See the descrip‐
>>>               tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
>>>               nel/overflowgid in proc(5).
>>>
>>>               Starting  with  Linux  3.8,  no privileges are needed to
>>>               create a user namespace, and mount, PID, IPC,  net,  and
>>>               UTS   namespaces   can   be   created   with   just  the
>>>               CAP_SYS_ADMIN capability in the caller's user namespace.
>>>
>>>               Over the years, there have been a lot of  features  that
>>>               have been added to the Linux kernel that are only avail‐
>>>               able to privileged users because of their  potential  to
>>>               confuse  set-user-ID-root  applications.  In general, it
>>>               becomes safe to allow the root user in a user  namespace
>>>               to use those features because it is impossible, while in
>>>               a user namespace, to gain more privilege than  the  root
>>>               user of a user namespace has.
>>
>> I don't have any problems with this bit of text.
>>
>> It occurs to me that what is going on with capabilities and user
>> namespaces needs to be documented better.  There was a minor bug with
>> them this release cycle and I realized while the current definition
>> makes sense and isn't hard to understand in general.  In detail the
>> interaction of capabilities and user namespaces are hard to describe.
>>
>> I think capabilities and user namespaces are the work of a future patch
>> however.
>
> Okay. So, below, a new iteration of the text. Could you please check
> it over, and note any errors to be fixed or improvements to be made.
>
> Thanks,
>
> Michael
>
>        CLONE_NEWUSER
>               (This  flag first became meaningful for clone() in Linux
>               2.6.23, the current clone()  semantics  were  merged  in
>               Linux  3.5, and the final pieces to make the user names‐
>               paces completely usable were merged in Linux 3.8.)
>
>               If CLONE_NEWUSER is set, then create the  process  in  a
>               new  user  namespace.  If this flag is not set, then (as
>               with fork(2)) the process is created in  the  same  user
>               namespace as the calling process.
>
>               A  user  namespace  provides an isolated environment for
>               security related identifiers, in particular,  user  IDs,
>               group IDs, keys (see keyctl(2)), and capabilities.
>
>               When  a user namespace is created, it starts out without
>               a mapping of user IDs (group IDs)  to  the  parent  user
>               namespace.   The desired mapping of user IDs (group IDs)
>               to the parent user namespace may be set by writing  into
>               /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5).
>
>               The  first process in a user namespace starts out with a
>               complete set of capabilities with  respect  to  the  new
>               user namespace.
>
>               System  calls  that  return  user  IDs  (group IDs) will
>               return either the user ID (group  ID)  mapped  into  the
>               current  user  namespace  if  there is a mapping, or the
>               overflow user ID (group ID); the default value  for  the
>               overflow  user ID (group ID) is 65534.  See the descrip‐
>               tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐
>               nel/overflowgid in proc(5).
>
>               Use  of  this flag requires a kernel configured with the
>               CONFIG_USER_NS  option.   Before  Linux  3.8,   use   of
>               CLONE_NEWUSER  required that the caller have three capa‐
>               bilities:  CAP_SYS_ADMIN,  CAP_SETUID,  and  CAP_SETGID.
>               Starting  with  Linux  3.8,  no privileges are needed to
>               create a user namespace, and mount, PID, IPC,  net,  and
>               UTS   namespaces   can   be   created   with   just  the
>               CAP_SYS_ADMIN capability in the caller's user namespace.
>
>               Over the years, there have been a lot of  features  that
>               have been added to the Linux kernel that are only avail‐
>               able to privileged users because of their  potential  to
>               confuse  set-user-ID-root  applications.  In general, it
>               becomes safe to allow the root user in a user  namespace
>               to use those features because it is impossible, while in
>               a user namespace, to gain more privilege than  the  root
>               user of a user namespace has.

I don't see anything wrong with that text.

Happy New Year.

Eric

_______________________________________________
Containers mailing list
Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/containers