On Sun, Oct 19, 2014 at 9:55 PM, Eric W.Biederman <ebiederm@xxxxxxxxxxxx> wrote: > > > On October 19, 2014 1:26:29 PM CDT, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >>On Sat, Oct 18, 2014 at 10:23 PM, Eric W. Biederman >><ebiederm@xxxxxxxxxxxx> wrote: >>> "Serge E. Hallyn" <serge@xxxxxxxxxx> writes: >>> >>>> Quoting Aditya Kali (adityakali@xxxxxxxxxx): >>>>> On Thu, Oct 16, 2014 at 2:12 PM, Serge E. Hallyn <serge@xxxxxxxxxx> >>wrote: >>>>> > Quoting Aditya Kali (adityakali@xxxxxxxxxx): >>>>> >> setns on a cgroup namespace is allowed only if >>>>> >> * task has CAP_SYS_ADMIN in its current user-namespace and >>>>> >> over the user-namespace associated with target cgroupns. >>>>> >> * task's current cgroup is descendent of the target >>cgroupns-root >>>>> >> cgroup. >>>>> > >>>>> > What is the point of this? >>>>> > >>>>> > If I'm a user logged into >>>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope and I start >>>>> > a container which is in >>>>> > /lxc/c1/user.slice/user-1000.slice/session-c12.scope/x1 >>>>> > then I will want to be able to enter the container's cgroup. >>>>> > The container's cgroup root is under my own (satisfying the >>>>> > below condition0 but my cgroup is not a descendent of the >>>>> > container's cgroup. >>>>> > >>>>> This condition is there because we don't want to do implicit cgroup >>>>> changes when a process attaches to another cgroupns. cgroupns tries >>to >>>>> preserve the invariant that at any point, your current cgroup is >>>>> always under the cgroupns-root of your cgroup namespace. But in >>your >>>>> example, if we allow a process in "session-c12.scope" container to >>>>> attach to cgroupns root'ed at "session-c12.scope/x1" container >>>>> (without implicitly moving its cgroup), then this invariant won't >>>>> hold. >>>> >>>> Oh, I see. Guess that should be workable. Thanks. >>> >>> Which has me looking at what the rules are for moving through >>> the cgroup hierarchy. >>> >>> As long as we have write access to cgroup.procs and are allowed >>> to open the file for write, we can move any of our own tasks >>> into the cgroup. So the cgroup namespace rules don't seem >>> to be a problem. >>> >>> Andy can you please take a look at the permission checks in >>> __cgroup_procs_write. >> >>The actual requirements for calling that function haven't changed, >>right? IOW, what does this have to do with cgroupns? > > Excluding user namespaces the requirements have not changed. > > The immediate correlation is that to enter a cgroupns you must first put your process in one of it's cgroups. > > So I was examining what it would take to enter the cgroup of cgroupns. > >> Is the idea >>that you want a privileged user wrt a cgroupns's userns to be able to >>use this? If so: >> >>Yes, that current_cred() thing is bogus. (Actually, this is probably >>exploitable right now if any cgroup.procs inode anywhere on the system >>lets non-root write.) (Can we have some kernel debugging option that >>makes any use of current_cred() in write(2) warn?) >> >>We really need a weaker version of may_ptrace for this kind of stuff. >>Maybe the existing may_ptrace stuff is okay, actually. But this is >>completely missing group checks, cap checks, capabilities wrt the >>userns, etc. >> >>Also, I think that, if this version of the patchset allows non-init >>userns to unshare cgroupns, then the issue of what permission is >>needed to lock the cgroup hierarchy like that needs to be addressed, >>because unshare(CLONE_NEWUSER|CLONE_NEWCGROUP) will effectively pin >>the calling task with no permission required. Bolting on a fix later >>will be a mess. > > I imagine the pinning would be like the userns. > > Ah but there is a potentially serious issue with the pinning. > With pinning we can make it impossible for root to move us to a different cgroup. > > I am not certain how serious that is but it bears thinking about. > If we don't implement pinning we should be able to implent everything with just filesystem mount options, and no new namespace required. > > Sigh. > > I am too tired tonight to see the end game in this. Possible solution: Ditch the pinning. That is, if you're outside a cgroupns (or you have a non-ns-confined cgroupfs mounted), then you can move a task in a cgroupns outside of its root cgroup. If you do this, then the task thinks its cgroup is something like "../foo" or "../../foo". While we're at it, consider making setns for a cgroupns *not* change the caller's cgroup. Is there any reason it really needs to? Thoughts? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-api" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html