On Tue, 2007-11-27 at 16:38 -0600, Serge E. Hallyn wrote: > Quoting Stephen Smalley (sds@xxxxxxxxxxxxx): > > On Tue, 2007-11-27 at 10:11 -0600, Serge E. Hallyn wrote: > > > Quoting Crispin Cowan (crispin@xxxxxxxxxxxxxxxx): > > > > Just the name "sys_hijack" makes me concerned. > > > > > > > > This post describes a bunch of "what", but doesn't tell us about "why" > > > > we would want this. What is it for? > > > > > > Please see my response to Casey's email. > > > > > > > And I second Casey's concern about careful management of the privilege > > > > required to "hijack" a process. > > > > > > Absolutely. We're definately still in RFC territory. > > > > > > Note that there are currently several proposed (but no upstream) ways to > > > accomplish entering a namespace: > > > > > > 1. bind_ns() is a new pair of syscalls proposed by Cedric. An > > > nsproxy is given an integer id. The id can be used to enter > > > an nsproxy, basically a straight current->nsproxy = target_nsproxy; > > > > > > 2. I had previously posted a patchset on top of the nsproxy > > > cgroup which allowed entering a nsproxy through the ns cgroup > > > interface. > > > > > > There are objections to both those patchsets because simply switching a > > > task's nsproxy using a syscall or file write in the middle of running a > > > binary is quite unsafe. Eric Biederman had suggested using ptrace or > > > something like it to accomplish the goal. > > > > > > Just using ptrace is however not safe either. You are inheriting *all* > > > of the target's context, so it shouldn't be difficult for a nefarious > > > container/vserver admin to trick the host admin into running something > > > which gives the container/vserver admin full access to the host. > > > > I don't follow the above - with ptrace, you are controlling a process > > already within the container (hence in theory already limited to its > > container), and it continues to execute within that container. What's > > the issue there? > > Hmm, yeah, I may have overspoken - I'm not good at making up exploits > but while I see it possible to confuse the host admin by setting bogus > environment, I guess there may not be an actual exploit. > > Still after the fork induced through ptrace, we'll have to execute a > file out of the hijacked process' namespaces and path (unless we get > *really* 'exotic'). With hijack, execution continues under the caller's > control, which I do much prefer. > > The remaining advantages of hijack over ptrace (beside "using ptrace for > that is crufty") are > > 1. not subject to pid wraparound (when doing hijack_cgroup > or hijack_ns) > 2. ability to enter a namespace which has no active processes So possibly I'm missing something, but the situation with hijack seems more exploitable than ptrace to me - you've created a hybrid task with one foot in current's world (open files, tty, connection to parent, executable) and one foot in the target's world (namespaces, uid/gid) which can then be leveraged by other tasks within the target's world/container as a way of breaking out of the container. No? > These also highlight selinux issues. In the case of hijacking an > empty cgroup, there is no security context (because there is no task) so > the context of 'current' will be used. In the case of hijacking a > populated cgroup, a task is chosen "at random" to be the hijack source. Seems like you might be better off with a single operation for creating a new task within a given namespace set / cgroup rather than trying to handle multiple situations with different semantics / inheritance behavior. IOW, forget about hijacking a specific pid or picking a task at random from a populated cgroup - just always initialize the state of the newly created task in the same manner based solely on elements of the caller's state and the cgroup's state. > So there are two ways to look at deciding which context to use. Since > control continues in the original acting process' context, we might > want the child to continue in its context. However if the process > creates any objects in the virtual server, we don't want them > mislabeled, so we might want the task in the hijacked task's context. I suspect that we want to continue in the parent's context, and then the program can always use setfscreatecon() or exec a helper in a different context if it wants to create files with contexts tailored to the target. > Sigh. So here's why I thought I'd punt on selinux at least until I had > a working selinux-enforced container/vserver :) -- Stephen Smalley National Security Agency -- This message was distributed to subscribers of the selinux mailing list. If you no longer wish to subscribe, send mail to majordomo@xxxxxxxxxxxxx with the words "unsubscribe selinux" without quotes as the message.