"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > Hi Eric, > > On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman > <ebiederm@xxxxxxxxxxxx> wrote: >> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >> --- >> man2/clone.2 | 39 +++++++++++++++++++++++++++++++++++++++ >> 1 files changed, 39 insertions(+), 0 deletions(-) >> >> diff --git a/man2/clone.2 b/man2/clone.2 >> index 0582057..4566677 100644 >> --- a/man2/clone.2 >> +++ b/man2/clone.2 >> @@ -366,6 +366,45 @@ in the same >> .BR clone () >> call. >> .TP >> +.BR CLONE_NEWUSER " (since Linux 3.6)" > > Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained > some meaning in 2.6.29. Looking at it where I have said 3.6 that is wrong. I meant 3.5. I think I made the same mistake in one or two other manpages. Nothing was merged in 3.6 unfortunately. My intent was these are the semantics of user namespaces since 3.5, when my rework/refocusing of them was merged. Since 3.5 all that has really happened with user namespaces is the uid/gid to kuid/kgid conversion, permission checks have been relaxed, and a few bugs have been fixed. 3.8 is huge from a usability standpoint. 3.8 is huge because setns(), and unshare() are now complete from a namespace perspective, and because enough permission checks have been relaxed in user namespaces that you can really start using them. But semantically from a user namespace perspective nothing really has changed in 3.8. >> +If >> +.B CLONE_NEWUSER >> +is set, the create the process in a new user namespace. If this flag is not set, then (as with >> +.BR fork (2)), >> +the process is created in the same user namespace as the calling process. >> + >> +A user namespace provides an isolated environment for security related identifiers in particular >> +uids, gids, keys (see >> +.BR keyctl (2)), >> +and capabilities. >> + >> +When a user namespace is created it initially starts out without a mapping of uids and gids >> +to the parent user namespace. The desired mapping of uids to the parent user namespace >> +may be set by writting into >> +.IR /proc/[pid]/uid_map. >> +The desired mapping of gids to the parent user namespace may be set by writinng into >> +.IR /proc/[pid]/gid_map. >> + >> +The first process in a user namespace starts out with a complete set of capabilities with >> +respect to the new user namespace. >> + >> +syscalls that return uids and gids will either return the uid or gid mapped into the current >> +user namespace if there is a mapping or depending on the context will return either >> +the overflowuid (default 65534) or the overflowgid (default 65534). See >> +.IR /proc/sys/kernel/overflowuid, /proc/sys/kernel/overflowgid >> + >> +As of Linux 3.8 no priviliges are needed to create a user namespace, >> +and mount, pid, ipc, net, uts namespaces can be created with just >> +CAP_SYS_ADMIN privileges in your current user namespace. >> + >> +Over the years there have been a lot of features that have been added >> +to the linux kernel that are only available to privileged users >> +because of their potential to confuse setuid root applications. In >> +general it becomes safe to allow the root user in a user namespace to >> +use those features because it is impossible while in a user namespace >> +to gain more privilege than the root user of a user namespace has. >> + >> +.TP >> .BR CLONE_NEWPID " (since Linux 2.6.24)" >> .\" This explanation draws a lot of details from >> .\" http://lwn.net/Articles/259217/ > > I reworked your text somewhat. Could you please review the following: > > [[ > CLONE_NEWUSER > (This flag first became meaningful for clone() in Linux > 2.6.29, but the implementation of user namespaces was > only completed in Linux 3.8.) Long rant about 2.6.29 vs 3.8 above. I think what we need to say is: (This flag first became meaningful for clone() in Linux 2.6.29, the current semantics were merged present in 3.5, and user namespaces only really became usable in 3.8.) > If CLONE_NEWUSER is set, > then create the process in a new user namespace. If > this flag is not set, then (as with fork(2)) the process > is created in the same user namespace as the calling > process. > > A user namespace provides an isolated environment for > security related identifiers, in particular, user IDs, > group IDs, keys (see keyctl(2)), and capabilities. > > When a user namespace is created, it starts out without > a mapping of user IDs (group IDs) to the parent user > namespace. The desired mapping of user IDs (group IDs) > to the parent user namespace may be set by writing into > /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5). /proc/[pid]/projid_map deserves a mention. Not that I am a fan of project is or that xfs where the are implemented has been converted yet but.... > The first process in a user namespace starts out with a > complete set of capabilities with respect to the new > user namespace. > > System calls that return user IDs (group IDs) will > return either the user ID (group ID) mapped into the > current user namespace if there is a mapping, or the > overflow user ID (group ID); the default value for the > overflow user ID (group ID) is 65534. See the descrip‐ > tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐ > nel/overflowgid in proc(5). > > Starting with Linux 3.8, no privileges are needed to > create a user namespace, and mount, PID, IPC, net, and > UTS namespaces can be created with just the > CAP_SYS_ADMIN capability in the caller's user namespace. > > Over the years, there have been a lot of features that > have been added to the Linux kernel that are only avail‐ > able to privileged users because of their potential to > confuse set-user-ID-root applications. In general, it > becomes safe to allow the root user in a user namespace > to use those features because it is impossible, while in > a user namespace, to gain more privilege than the root > user of a user namespace has. I don't have any problems with this bit of text. It occurs to me that what is going on with capabilities and user namespaces needs to be documented better. There was a minor bug with them this release cycle and I realized while the current definition makes sense and isn't hard to understand in general. In detail the interaction of capabilities and user namespaces are hard to describe. I think capabilities and user namespaces are the work of a future patch however. > ]] > > Thanks, > > Michael _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers