"Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: > Hi Eric, > > On Thu, Dec 27, 2012 at 6:40 PM, Eric W. Biederman > <ebiederm@xxxxxxxxxxxx> wrote: >> "Michael Kerrisk (man-pages)" <mtk.manpages@xxxxxxxxx> writes: >> >>> Hi Eric, >>> >>> Some questions below. >> >> A quick note. Getting the permission checks correct has been a little >> more interesting that I would have preferred. >> >> I had to add a nsown_capable(CAP_SYS_ADMIN) check to all of the setns() >> install methods except the user namespace. Not a change in pre 3.8 >> behavior but a change to my patch, and possibly a documentation change >> below. >> >>> On Tue, Nov 27, 2012 at 1:48 AM, Eric W. Biederman >>> <ebiederm@xxxxxxxxxxxx> wrote: >>>> >>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>>> --- >>>> man2/setns.2 | 41 +++++++++++++++++++++++++++++++++-------- >>>> 1 files changed, 33 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/man2/setns.2 b/man2/setns.2 >>>> index 6aa01e1..63b04dc 100644 >>>> --- a/man2/setns.2 >>>> +++ b/man2/setns.2 >>>> @@ -48,6 +48,18 @@ must refer to a network namespace. >>>> .BR CLONE_NEWUTS >>>> .I fd >>>> must refer to a UTS namespace. >>>> +.TP >>>> +.BR CLONE_NEWPID >>>> +.I fd >>>> +must refer to a PID namespace. >>>> +.TP >>>> +.BR CLONE_NEWUSER >>>> +.I fd >>>> +must refer to a user namespace. >>>> +.TP >>>> +.BR CLONE_NEWNS >>>> +.I fd >>>> +must refer to a mount namespace. >>>> .PP >>>> Specifying >>>> .I nstype >>>> @@ -63,6 +75,25 @@ and wants to ensure that the namespace is of a particular type. >>>> .IR fd >>>> if the file descriptor was opened by another process and, for example, >>>> passed to the caller via a UNIX domain socket.) >>>> + >>>> +The pid namespace is a little different. Reassociating the calling >>>> +thread with a pid namespace only changes the pid namespace that the >>>> +child processes will be created in. >>>> + >>>> +Changing the pid namespace for child processes is only allowed if the >>>> +pid namespace specified by >>>> +.IR fd >>>> +is a child pid namespace of the pid namespace of the current thread. >>> >>> I assume "current thread" above should be "calling thread", right? >> >> What I mean in "current" from a kernel perspective. >> >> It should be just "caller". > > Okay. Changed. > >> Threads must share a pid namespace so mentioning threads seems wrong. >> >>>> + >>>> +A multi-threaded process may not change user namespace with setns. A >>>> +process may not reassociate the thread with the current user >>>> +namespace. >>> >>> What do you mean by "the current user nsamesapce"? >> >> fd = open("/proc/self/ns/user"); >> setns(fd) -> -EINVAL. >> >> So from a userspace perspective I mean "the callers user namespace". >> >>>> The process reassociating itself with a user namespace >>>> +must have CAP_SYS_ADMIN privileges in the target user namespace. >>>> >>>> +A process may not be reassociated with a new mount namespace if it is >>>> +multi-threaded >>> >>> I tried to verify the precdeing two lines from the kernel source, but >>> did not work out where this check is made. Where is it? >> >> kernel/user_namespace.c:userns_install() >> fs/namespace.c:mntns_install() > > Thanks. > >> A couple of the security checks have been pushed down into a per >> namespace context, because the exact check that makes sense depends on >> the namespace. >> >>>> or it does not possess both CAP_SYS_CHROOT privileges >>>> +and CAP_SYS_ADMIN rights over the target mount namespace. >>> >>> Could you please expand/clarify the preceding two lines. As they >>> stand, I don't really understand them. >> >> Ugh. The text is slightly wrong. >> >> The code is: >> if (!ns_capable(mnt_ns->user_ns, CAP_SYS_ADMIN) || >> !nsown_capable(CAP_SYS_CHROOT) || >> !nsown_capable(CAP_SYS_ADMIN)) >> return -EPERM; >> >> Basically you aren't allowed change your mount namespace into >> a mount namespace that doesn't see you as the all powerful root >> able to mount and unmount filesystems. >> >> You aren't allowed to change your mount namespace unless you possesses >> CAP_SYS_CHROOT and CAP_SYS_ADMIN. > > Okay -- reworded. > > So, I've done some more reworking of the text, which now reads as > folows. Could you please check this (and see my questions below). > > CLONE_NEWPID behaves somewhat differently from the other > nstype values: reassociating the calling thread with a > PID namespace only changes the PID namespace that child > processes of the caller will be created in; it does not > change the PID namespace of the caller itself. > I reworked the preceding piece a lot. Is it correct still? > > Reassoci‐ > ating with a PID namespace is only allowed if the PID > namespace specified by fd is a descendant (child, grand‐ > child, etc.) > > Is the preceding sentence correct? (You talked only of children in > your original patch, but I believe it's more general than that.) Yes. That is correct. > PID namespace of the PID namespace of the > caller. > > A multi-threaded process may not change user namespace > with setns(). A process may not reassociate the thread > with the caller's user namespace. > > What does the last sentence above *mean*? I don't understand it. So the set of checks are: /* Don't allow gaining capabilities by reentering * the same user namespace. */ if (user_ns == current_user_ns()) return -EINVAL; /* Threaded processes may not enter a different user namespace */ if (atomic_read(¤t->mm->mm_users) > 1) return -EINVAL; if (!ns_capable(user_ns, CAP_SYS_ADMIN)) return -EPERM; Rereading it looks like I was going fast and suffered from dropping important words. A multi-threaded process may not change it's user namespace with setns(). aka if you have threads setns for a user namespace will fail. A process may not change the user namespace to the caller's user namespace via setns. This is important because changing to a user namespace via setns implies gaining all caps, and you should not be able to gain all caps over your current user namespace. Hopefully that clears it up. > A process reassociat‐ > ing itself with a user namespace must have CAP_SYS_ADMIN > privileges in the target user namespace. > > A process may not be reassociated with a new mount names‐ > pace if it is multi-threaded. Changing the mount names‐ > pace requires that the caller possess both CAP_SYS_CHROOT > and CAP_SYS_ADMIN capabilities. > > Re the last sentence: are those capabilities required in (1) the > target namespace, or (2) the source namespace, or (3) both? I suspect > (1), but please confirm. CAP_SYS_ADMIN is required in the current user namespace. CAP_SYS_ADMIN is required over the target mount namesapce. CAP_SYS_CHROOT is required in the current user namespace. Eric _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers