Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx): > "Serge E. Hallyn" <serge@xxxxxxxxxx> writes: > > > Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx): > >> IIUC there are multiple ways to end up with a socket pair for which > >> one end is in a user namespace and the other is outside of it. That > >> means that SCM_CREDENTIALS can be used by a process in a userns to > >> authenticate to a process outside. > >> > >> This is all well and good (and, as far as I know, correct), but I'm > > > > And the cgroup manager I'm starting on depends on this. > > > >> not sure this is always the desired behavior. In the context of a > >> tool like Docker, it might be useful to have several user namespaces > >> that have the *same* uids mapped. Nonetheless, if one of those > >> namespaces is compromised, it probably shouldn't be permitted to > >> attack things outside the user namespace (or in the host, if any > >> interesting uids are mapped). > >> > >> Would it make sense to have an option to allow a user namespace to opt > >> into different behavior so that its users show up as the invalid uid > >> as seen from outside (as least for SCM_CREDENTIALS and SO_PEERCRED)? > >> > >> Implementing this might be awkward (ok, it might actively suck due to > >> a possible need for reference counting), but I'm wondering if it's a > >> good idea even in principle. > > > > Well, I'll grant you, if I have a single directory with a socket in > > it, and I make that the aufs or overlayfs underlay for two separate > > mounts, which each are in different containers, then you might have > > a problem here. > > > > Now maybe the answer to that is that the sockets should be created > > in tmpfss (/run, /tmp, etc) anyway. But the more I think about it > > the more I, unfortunately, agree that this could be a problem. > > I really hate the concept of mapping a uid in some contexts and not > others. That seems very prone to go wrong. Given all of the possible > kinds of perumutations I can't imagine how we would get it correct. > > MS_NOSUID and MS_RDONLY will help with some of the worst offenders. > But it will still be possible for the user namespace root to call > setuid(NNN); and create a process with that uid. And if a unix domain > socket isn't the only means of interacting there will still be problems. > > I will suggest that writing a uid mapping filesystem like overlayfs or > perhaps as a mount option of overlayfs is likely to be a more robuse > solution in general. Certainly that is what I originally had on the > drawing board to solve this class of problem. Actually an option to aufs and overlayfs to say "any unix domain socket which is opened must first be copied to the writeable layer" would solve the issue (at least for all reasonable cases, iiuc) -serge _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers