Hi Eric, On Thu, Dec 27, 2012 at 6:20 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > >> Hi Eric, >> >> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman >> <ebiederm@xxxxxxxxxxxx> wrote: >>> >>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>> --- >>> man2/clone.2 | 39 +++++++++++++++++++++++++++++++++++++++ >>> 1 files changed, 39 insertions(+), 0 deletions(-) >>> >>> diff --git a/man2/clone.2 b/man2/clone.2 >>> index 0582057..4566677 100644 >>> --- a/man2/clone.2 >>> +++ b/man2/clone.2 >>> @@ -366,6 +366,45 @@ in the same >>> .BR clone () >>> call. >>> .TP >>> +.BR CLONE_NEWUSER " (since Linux 3.6)" >> >> Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained >> some meaning in 2.6.29. > > Looking at it where I have said 3.6 that is wrong. I meant 3.5. Okay. > I think I made the same mistake in one or two other manpages. Nothing > was merged in 3.6 unfortunately. I think the other cases have been fixed by now. > My intent was these are the semantics of user namespaces since 3.5, > when my rework/refocusing of them was merged. > > Since 3.5 all that has really happened with user namespaces is the > uid/gid to kuid/kgid conversion, permission checks have been relaxed, > and a few bugs have been fixed. > > 3.8 is huge from a usability standpoint. 3.8 is huge because setns(), > and unshare() are now complete from a namespace perspective, and because > enough permission checks have been relaxed in user namespaces that you > can really start using them. > > But semantically from a user namespace perspective nothing really has > changed in 3.8. > [...] >> I reworked your text somewhat. Could you please review the following: >> >> [[ >> CLONE_NEWUSER >> (This flag first became meaningful for clone() in Linux >> 2.6.29, but the implementation of user namespaces was >> only completed in Linux 3.8.) > > Long rant about 2.6.29 vs 3.8 above. I think what we need to say is: > > (This flag first became meaningful for clone() in Linux > 2.6.29, the current semantics were merged present in > 3.5, and user namespaces only really became usable in 3.8.) Yup. I've done something like that now. >> If CLONE_NEWUSER is set, >> then create the process in a new user namespace. If >> this flag is not set, then (as with fork(2)) the process >> is created in the same user namespace as the calling >> process. >> >> A user namespace provides an isolated environment for >> security related identifiers, in particular, user IDs, >> group IDs, keys (see keyctl(2)), and capabilities. >> >> When a user namespace is created, it starts out without >> a mapping of user IDs (group IDs) to the parent user >> namespace. The desired mapping of user IDs (group IDs) >> to the parent user namespace may be set by writing into >> /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5). > > /proc/[pid]/projid_map deserves a mention. Not that > I am a fan of project is or that xfs where the are > implemented has been converted yet but.... Would you be able to send a patch documenting this in proc(5)? >> The first process in a user namespace starts out with a >> complete set of capabilities with respect to the new >> user namespace. >> >> System calls that return user IDs (group IDs) will >> return either the user ID (group ID) mapped into the >> current user namespace if there is a mapping, or the >> overflow user ID (group ID); the default value for the >> overflow user ID (group ID) is 65534. See the descrip‐ >> tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐ >> nel/overflowgid in proc(5). >> >> Starting with Linux 3.8, no privileges are needed to >> create a user namespace, and mount, PID, IPC, net, and >> UTS namespaces can be created with just the >> CAP_SYS_ADMIN capability in the caller's user namespace. >> >> Over the years, there have been a lot of features that >> have been added to the Linux kernel that are only avail‐ >> able to privileged users because of their potential to >> confuse set-user-ID-root applications. In general, it >> becomes safe to allow the root user in a user namespace >> to use those features because it is impossible, while in >> a user namespace, to gain more privilege than the root >> user of a user namespace has. > > I don't have any problems with this bit of text. > > It occurs to me that what is going on with capabilities and user > namespaces needs to be documented better. There was a minor bug with > them this release cycle and I realized while the current definition > makes sense and isn't hard to understand in general. In detail the > interaction of capabilities and user namespaces are hard to describe. > > I think capabilities and user namespaces are the work of a future patch > however. Okay. So, below, a new iteration of the text. Could you please check it over, and note any errors to be fixed or improvements to be made. Thanks, Michael CLONE_NEWUSER (This flag first became meaningful for clone() in Linux 2.6.23, the current clone() semantics were merged in Linux 3.5, and the final pieces to make the user names‐ paces completely usable were merged in Linux 3.8.) If CLONE_NEWUSER is set, then create the process in a new user namespace. If this flag is not set, then (as with fork(2)) the process is created in the same user namespace as the calling process. A user namespace provides an isolated environment for security related identifiers, in particular, user IDs, group IDs, keys (see keyctl(2)), and capabilities. When a user namespace is created, it starts out without a mapping of user IDs (group IDs) to the parent user namespace. The desired mapping of user IDs (group IDs) to the parent user namespace may be set by writing into /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5). The first process in a user namespace starts out with a complete set of capabilities with respect to the new user namespace. System calls that return user IDs (group IDs) will return either the user ID (group ID) mapped into the current user namespace if there is a mapping, or the overflow user ID (group ID); the default value for the overflow user ID (group ID) is 65534. See the descrip‐ tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐ nel/overflowgid in proc(5). Use of this flag requires a kernel configured with the CONFIG_USER_NS option. Before Linux 3.8, use of CLONE_NEWUSER required that the caller have three capa‐ bilities: CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID. Starting with Linux 3.8, no privileges are needed to create a user namespace, and mount, PID, IPC, net, and UTS namespaces can be created with just the CAP_SYS_ADMIN capability in the caller's user namespace. Over the years, there have been a lot of features that have been added to the Linux kernel that are only avail‐ able to privileged users because of their potential to confuse set-user-ID-root applications. In general, it becomes safe to allow the root user in a user namespace to use those features because it is impossible, while in a user namespace, to gain more privilege than the root user of a user namespace has. -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface"; http://man7.org/tlpi/ _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers