"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > Hi Eric, > > On Thu, Dec 27, 2012 at 6:20 PM, Eric W. Biederman > <ebiederm@xxxxxxxxxxxx> wrote: >> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >> >>> Hi Eric, >>> >>> On Tue, Nov 27, 2012 at 1:46 AM, Eric W. Biederman >>> <ebiederm@xxxxxxxxxxxx> wrote: >>>> >>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>>> --- >>>> man2/clone.2 | 39 +++++++++++++++++++++++++++++++++++++++ >>>> 1 files changed, 39 insertions(+), 0 deletions(-) >>>> >>>> diff --git a/man2/clone.2 b/man2/clone.2 >>>> index 0582057..4566677 100644 >>>> --- a/man2/clone.2 >>>> +++ b/man2/clone.2 >>>> @@ -366,6 +366,45 @@ in the same >>>> .BR clone () >>>> call. >>>> .TP >>>> +.BR CLONE_NEWUSER " (since Linux 3.6)" >>> >>> Why "since Linux 3.6"? As fas as I can see, CLONE_NEWUSER first gained >>> some meaning in 2.6.29. >> >> Looking at it where I have said 3.6 that is wrong. I meant 3.5. > > Okay. > >> I think I made the same mistake in one or two other manpages. Nothing >> was merged in 3.6 unfortunately. > > I think the other cases have been fixed by now. > >> My intent was these are the semantics of user namespaces since 3.5, >> when my rework/refocusing of them was merged. >> >> Since 3.5 all that has really happened with user namespaces is the >> uid/gid to kuid/kgid conversion, permission checks have been relaxed, >> and a few bugs have been fixed. >> >> 3.8 is huge from a usability standpoint. 3.8 is huge because setns(), >> and unshare() are now complete from a namespace perspective, and because >> enough permission checks have been relaxed in user namespaces that you >> can really start using them. >> >> But semantically from a user namespace perspective nothing really has >> changed in 3.8. >> > [...] > >>> I reworked your text somewhat. Could you please review the following: >>> >>> [[ >>> CLONE_NEWUSER >>> (This flag first became meaningful for clone() in Linux >>> 2.6.29, but the implementation of user namespaces was >>> only completed in Linux 3.8.) >> >> Long rant about 2.6.29 vs 3.8 above. I think what we need to say is: >> >> (This flag first became meaningful for clone() in Linux >> 2.6.29, the current semantics were merged present in >> 3.5, and user namespaces only really became usable in 3.8.) > > Yup. I've done something like that now. > >>> If CLONE_NEWUSER is set, >>> then create the process in a new user namespace. If >>> this flag is not set, then (as with fork(2)) the process >>> is created in the same user namespace as the calling >>> process. >>> >>> A user namespace provides an isolated environment for >>> security related identifiers, in particular, user IDs, >>> group IDs, keys (see keyctl(2)), and capabilities. >>> >>> When a user namespace is created, it starts out without >>> a mapping of user IDs (group IDs) to the parent user >>> namespace. The desired mapping of user IDs (group IDs) >>> to the parent user namespace may be set by writing into >>> /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5). >> >> /proc/[pid]/projid_map deserves a mention. Not that >> I am a fan of project is or that xfs where the are >> implemented has been converted yet but.... > > Would you be able to send a patch documenting this in proc(5)? Sure. I don't know why I didn't mention projid in my earlier patch. Same story fewer permission checks. Silly me. >>> The first process in a user namespace starts out with a >>> complete set of capabilities with respect to the new >>> user namespace. >>> >>> System calls that return user IDs (group IDs) will >>> return either the user ID (group ID) mapped into the >>> current user namespace if there is a mapping, or the >>> overflow user ID (group ID); the default value for the >>> overflow user ID (group ID) is 65534. See the descrip‐ >>> tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐ >>> nel/overflowgid in proc(5). >>> >>> Starting with Linux 3.8, no privileges are needed to >>> create a user namespace, and mount, PID, IPC, net, and >>> UTS namespaces can be created with just the >>> CAP_SYS_ADMIN capability in the caller's user namespace. >>> >>> Over the years, there have been a lot of features that >>> have been added to the Linux kernel that are only avail‐ >>> able to privileged users because of their potential to >>> confuse set-user-ID-root applications. In general, it >>> becomes safe to allow the root user in a user namespace >>> to use those features because it is impossible, while in >>> a user namespace, to gain more privilege than the root >>> user of a user namespace has. >> >> I don't have any problems with this bit of text. >> >> It occurs to me that what is going on with capabilities and user >> namespaces needs to be documented better. There was a minor bug with >> them this release cycle and I realized while the current definition >> makes sense and isn't hard to understand in general. In detail the >> interaction of capabilities and user namespaces are hard to describe. >> >> I think capabilities and user namespaces are the work of a future patch >> however. > > Okay. So, below, a new iteration of the text. Could you please check > it over, and note any errors to be fixed or improvements to be made. > > Thanks, > > Michael > > CLONE_NEWUSER > (This flag first became meaningful for clone() in Linux > 2.6.23, the current clone() semantics were merged in > Linux 3.5, and the final pieces to make the user names‐ > paces completely usable were merged in Linux 3.8.) > > If CLONE_NEWUSER is set, then create the process in a > new user namespace. If this flag is not set, then (as > with fork(2)) the process is created in the same user > namespace as the calling process. > > A user namespace provides an isolated environment for > security related identifiers, in particular, user IDs, > group IDs, keys (see keyctl(2)), and capabilities. > > When a user namespace is created, it starts out without > a mapping of user IDs (group IDs) to the parent user > namespace. The desired mapping of user IDs (group IDs) > to the parent user namespace may be set by writing into > /proc/[pid]/uid_map (/proc/[pid]/gid_map); see proc(5). > > The first process in a user namespace starts out with a > complete set of capabilities with respect to the new > user namespace. > > System calls that return user IDs (group IDs) will > return either the user ID (group ID) mapped into the > current user namespace if there is a mapping, or the > overflow user ID (group ID); the default value for the > overflow user ID (group ID) is 65534. See the descrip‐ > tions of /proc/sys/kernel/overflowuid and /proc/sys/ker‐ > nel/overflowgid in proc(5). > > Use of this flag requires a kernel configured with the > CONFIG_USER_NS option. Before Linux 3.8, use of > CLONE_NEWUSER required that the caller have three capa‐ > bilities: CAP_SYS_ADMIN, CAP_SETUID, and CAP_SETGID. > Starting with Linux 3.8, no privileges are needed to > create a user namespace, and mount, PID, IPC, net, and > UTS namespaces can be created with just the > CAP_SYS_ADMIN capability in the caller's user namespace. > > Over the years, there have been a lot of features that > have been added to the Linux kernel that are only avail‐ > able to privileged users because of their potential to > confuse set-user-ID-root applications. In general, it > becomes safe to allow the root user in a user namespace > to use those features because it is impossible, while in > a user namespace, to gain more privilege than the root > user of a user namespace has. I don't see anything wrong with that text. Happy New Year. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers