lmctfy literally supports ".." as a container name :) On Tue, Nov 26, 2013 at 12:58 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote: > Quoting Tim Hockin (thockin@xxxxxxxxxx): >> On Mon, Nov 25, 2013 at 9:47 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote: >> > Quoting Tim Hockin (thockin@xxxxxxxxxx): > ... >> >> > . A client (requestor 'r') can make cgroup requests over >> >> > /sys/fs/cgroup/manager using dbus calls. Detailed privilege >> >> > requirements for r are listed below. >> >> > . The client request will pertain an existing or new cgroup A. r's >> >> > privilege over the cgroup must be checked. r is said to have >> >> > privilege over A if A is owned by r's uid, or if A's owner is mapped >> >> > into r's user namespace, and r is root in that user namespace. >> >> >> >> Problem with this definition. Being owned-by is not the same as >> >> has-root-in. Specifically, I may choose to give you root in your own >> >> namespace, but you sure as heck can not increase your own memory >> >> limit. >> > >> > 1. If you don't want me to change the value at all, then just don't map >> > A's owner into the namespace. I'm uid 100000 which is root in my namespace, >> > but I only have privilege over other uids mapped into my namespace. >> >> I think I understand this, but it is subtle. Maybe some examples would help? > > When you create a user namespace, at first it is empty, and you are 'nobody' > (-1). Then magically some uids from the host, say 100000-101999, are mapped > into your namespace, to uids 0-1999. > > Now assume you're uid 0 inside that namespace. You have privilege over your > uids, 0-999, which are 100000-101999 on the host. > > If cgroup file A is owned by host uid 0, then the owner is not mapped into > the user namespace. uid 0 inside the namespace only gets the world access > rights to that file. > > If cgroup file A is owned by host uid 100100, then uid 0 in the > namespace has access to that file by virtue of being root, and uid 100 > in the namespace (100100 on the host) has access to the file by virtue > of being the owner. > >> > 2. I've considered never allowing changes to your own cgroup. So if you're >> > in /a/b, you can create /a/b/c and modify c's settings, but you can't modify >> > b's. OTOH, that isn't strictly necessary - if we did allow it, then you >> > could simply clam /a/b's memory to what you want, and stick me in /a/b/c, >> > so I can't escape the memory limit you wanted. >> >> This is different from what we do internally, but it's an interesting >> semantic. I'm wary of how much we want to make this API about >> enforcement of policy vs simple enactment. In other words, semantics >> that diverge from UNIX ownership might be more complicated to >> understand than they are worth. > > The semantics I gave are exactly the user namespace semantics. If you're > not using a user namespace then they simply do not apply, and you are back > to strict UNIX ownership semantics that you want. But allowing 'root' in > a user namespace to have privilege over uids, without having any privilege > outside its own namespace, must be honored for this to be usable by lxc. > > Like I said, on the bright side, if you don't want to care about user > namespaces, then everything falls back to strict unix semantics - so if > you don't want to care, you don't have to care. > >> > 3. I've not considered having the daemon track resource limits - i.e. creating >> > a cgroup and saying "give it 100M swap, and if it asks, let it increase that >> > to 200M." I'd prefer that be done incidentally through (1) and (2). Do you >> > feel that would be insufficient? >> >> I think this is a higher-level issue that should not be addressed here. >> >> > Or maybe your question is something different and I'm missing it? >> >> My point was that I, as machine admin, create a memory cgroup of 100 >> MB for you and put you in it. I also give you root-in-namespace. >> You must not be able to change 100 MB to 200 MB. From your (1) you >> are saying that system UID 0 owns the cgroup and is NOT mapped into >> your namespace. Therefore your definition holds. I think I can buy >> that. >> >> >> > . The client request may pertain a victim task v, which may be moved >> >> > to a new cgroup. In that case r's privilege over both the cgroup >> >> > and v must be checked. r is said to have privilege over v if v >> >> > is mapped in r's pid namespace, v's uid is mapped into r's user ns, >> >> > and r is root in its userns. Or if r and v have the same uid >> >> > and v is mapped in r's pid namespace. >> >> > . r's credentials will be taken from socket's peercred, ensuring that >> >> > pid and uid are translated. >> >> > . r passes PID(v) as a SCM_CREDENTIAL, so that cgmanager receives the >> >> > translated global pid. It will then read UID(v) from /proc/PID(v)/status, >> >> > which is the global uid, and check /proc/PID(r)/uid_map to see whether >> >> > UID is mapped there. >> >> > . dbus-send can be enhanced to send a pid as SCM_CREDENTIAL to have >> >> > the kernel translate it for the reader. Only 'move task v to cgroup >> >> > A' will require a SCM_CREDENTIAL to be sent. >> >> > >> >> > Privilege requirements by action: >> >> > * Requestor of an action (r) over a socket may only make >> >> > changes to cgroups over which it has privilege. >> >> > * Requestors may be limited to a certain #/depth of cgroups >> >> > (to limit memory usage) - DEFER? >> >> > * Cgroup hierarchy is responsible for resource limits >> >> > * A requestor must either be uid 0 in its userns with victim mapped >> >> > ito its userns, or the same uid and in same/ancestor pidns as the >> >> > victim >> >> > * If r requests creation of cgroup '/x', /x will be interpreted >> >> > as relative to r's cgroup. r cannot make changes to cgroups not >> >> > under its own current cgroup. >> >> >> >> Does this imply that r in a lower-level (farter from root) of the >> >> hierarchy can not make requests of higher levels of the hierarchy >> >> (closer to root), even though they have permissions as per the >> >> definition of privilege? >> > >> > Right. >> >> Is this really a required semantic? We have use cases where >> read-access is required to parent cgroups, which means this agent >> could never handle reads. It's not clear that we have use cases for >> write-access to parents, though we have talked about eventfd - is that >> read or write access? Does this daemon want to handle event fd? > > Denying read access to parent cgroups is not strictly necessary to meet > any of my requirements. Eventfd only requires an open read handle to > the file, so that should be ok. > > So to support that, I guess I'd want to add a 'get-my-cgroup' > command with controller argument, which reeturns the absolute > path. Cgroups which start with a '/' are taken as absolute > cgroup paths, as opposed to the usual, relative-to-my-own. > It sounds like you also might want to just use '../' ? > > I'd refuse write access for now altogether. We can talk later, if > someone finds a need, about a way to support conditional write > access, but that's pretty much completely bypassing the hierarchial > constraints :) > > -serge -- To unsubscribe from this list: send the line "unsubscribe cgroups" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html