Hi Eric, On Thu, Dec 27, 2012 at 6:40 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > >> Hi Eric, >> >> Some questions below. > > A quick note. Getting the permission checks correct has been a little > more interesting that I would have preferred. > > I had to add a nsown_capable(CAP_SYS_ADMIN) check to all of the setns() > install methods except the user namespace. Not a change in pre 3.8 > behavior but a change to my patch, and possibly a documentation change > below. > >> On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman >> <ebiederm@xxxxxxxxxxxx> wrote: >>> >>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>> --- >>> man2/setns.2 | 41 +++++++++++++++++++++++++++++++++-------- >>> 1 files changed, 33 insertions(+), 8 deletions(-) >>> >>> diff --git a/man2/setns.2 b/man2/setns.2 >>> index 6aa01e1..63b04dc 100644 >>> --- a/man2/setns.2 >>> +++ b/man2/setns.2 >>> @@ -48,6 +48,18 @@ must refer to a network namespace. >>> .BR CLONE_NEWUTS >>> .I fd >>> must refer to a UTS namespace. >>> +.TP >>> +.BR CLONE_NEWPID >>> +.I fd >>> +must refer to a PID namespace. >>> +.TP >>> +.BR CLONE_NEWUSER >>> +.I fd >>> +must refer to a user namespace. >>> +.TP >>> +.BR CLONE_NEWNS >>> +.I fd >>> +must refer to a mount namespace. >>> .PP >>> Specifying >>> .I nstype >>> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type. >>> .IR fd >>> if the file descriptor was opened by another process and, for example, >>> passed to the caller via a UNIX domain socket.) >>> + >>> +The pid namespace is a little different. Reassociating the calling >>> +thread with a pid namespace only changes the pid namespace that the >>> +child processes will be created in. >>> + >>> +Changing the pid namespace for child processes is only allowed if the >>> +pid namespace specified by >>> +.IR fd >>> +is a child pid namespace of the pid namespace of the current thread. >> >> I assume "current thread" above should be "calling thread", right? > > What I mean in "current" from a kernel perspective. > > It should be just "caller". Okay. Changed. > Threads must share a pid namespace so mentioning threads seems wrong. > >>> + >>> +A multi-threaded process may not change user namespace with setns. A >>> +process may not reassociate the thread with the current user >>> +namespace. >> >> What do you mean by "the current user nsamesapce"? > > fd = open("/proc/self/ns/user"); > setns(fd) -> -EINVAL. > > So from a userspace perspective I mean "the callers user namespace". > >>> The process reassociating itself with a user namespace >>> +must have CAP_SYS_ADMIN privileges in the target user namespace. >>> >>> +A process may not be reassociated with a new mount namespace if it is >>> +multi-threaded >> >> I tried to verify the precdeing two lines from the kernel source, but >> did not work out where this check is made. Where is it? > > kernel/user_namespace.c:userns_install() > fs/namespace.c:mntns_install() Thanks. > A couple of the security checks have been pushed down into a per > namespace context, because the exact check that makes sense depends on > the namespace. > >>> or it does not possess both CAP_SYS_CHROOT privileges >>> +and CAP_SYS_ADMIN rights over the target mount namespace. >> >> Could you please expand/clarify the preceding two lines. As they >> stand, I don't really understand them. > > Ugh. The text is slightly wrong. > > The code is: > if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) || > !nsown_capable(CAP_SYS_CHROOT) || > !nsown_capable(CAP_SYS_ADMIN)) > return -EPERM; > > Basically you aren't allowed change your mount namespace into > a mount namespace that doesn't see you as the all powerful root > able to mount and unmount filesystems. > > You aren't allowed to change your mount namespace unless you possesses > CAP_SYS_CHROOT and CAP_SYS_ADMIN. Okay -- reworded. So, I've done some more reworking of the text, which now reads as folows. Could you please check this (and see my questions below). CLONE_NEWPID behaves somewhat differently from the other nstype values: reassociating the calling thread with a PID namespace only changes the PID namespace that child processes of the caller will be created in; it does not change the PID namespace of the caller itself. I reworked the preceding piece a lot. Is it correct still? Reassoci‐ ating with a PID namespace is only allowed if the PID namespace specified by fd is a descendant (child, grand‐ child, etc.) Is the preceding sentence correct? (You talked only of children in your original patch, but I believe it's more general than that.) PID namespace of the PID namespace of the caller. A multi-threaded process may not change user namespace with setns(). A process may not reassociate the thread with the caller's user namespace. What does the last sentence above *mean*? I don't understand it. A process reassociat‐ ing itself with a user namespace must have CAP_SYS_ADMIN privileges in the target user namespace. A process may not be reassociated with a new mount names‐ pace if it is multi-threaded. Changing the mount names‐ pace requires that the caller possess both CAP_SYS_CHROOT and CAP_SYS_ADMIN capabilities. Re the last sentence: are those capabilities required in (1) the target namespace, or (2) the source namespace, or (3) both? I suspect (1), but please confirm. Thanks, Michael -- Michael Kerrisk Linux man-pages maintainer; http://www.kernel.org/doc/man-pages/ Author of "The Linux Programming Interface"; http://man7.org/tlpi/ _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers