Quoting Andy Lutomirski (luto@xxxxxxxxxxxxxx): > On Tue, Oct 14, 2014 at 3:14 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote: > > Quoting Serge E. Hallyn (serge@xxxxxxxxxx): > >> Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx): > >> > Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: > >> > > >> > > If a process gets access to a mount from a descendent or unrelated > >> > > user namespace, that process should not be able to take advantage of > >> > > setuid files or selinux entrypoints from that filesystem. > >> > > > >> > > This will make it safer to allow more complex filesystems to be > >> > > mounted in non-root user namespaces. > >> > > > >> > > This does not remove the need for MNT_LOCK_NOSUID. The setuid, > >> > > setgid, and file capability bits can no longer be abused if code in > >> > > a user namespace were to clear nosuid on an untrusted filesystem, > >> > > but this patch, by itself, is insufficient to protect the system > >> > > from abuse of files that, when execed, would increase MAC privilege. > >> > > > >> > > As a more concrete explanation, any task that can manipulate a > >> > > vfsmount associated with a given user namespace already has > >> > > capabilities in that namespace and all of its descendents. If they > >> > > can cause a malicious setuid, setgid, or file-caps executable to > >> > > appear in that mount, then that executable will only allow them to > >> > > elevate privileges in exactly the set of namespaces in which they > >> > > are already privileges. > >> > > > >> > > On the other hand, if they can cause a malicious executable to > >> > > appear with a dangerous MAC label, running it could change the > >> > > caller's security context in a way that should not have been > >> > > possible, even inside the namespace in which the task is confined. > >> > > >> > As presented this is complete and total nonsense. Mount propgation > >> > strongly weakens if not completely breaks the assumptions you are making > >> > in this code. > >> > > >> > To write any generic code that knows anything we need to capture a user > >> > namespace on struct super. > >> > > >> > Further I think all we really want is to filter out security labels from > >> > unprivileged mounts. uids/gids and the like should be completely fine > >> > because of the uid mappings. > >> > > >> > Having been down the route of comparing uids as userns uid tuples I am > >> > convinced that anything requires us to take the user namespace into > >> > account on a routine basis in the core will simply be broken for someone > >> > forgetting somewhere. This looks like a design that has that kind of > >> > susceptibility. > >> > >> The above paragraph is very compelling. However Andy's patch is a step > >> in the right direction from what we've got. I think given what you say > >> below and given Andy's rationale above, simply tweaking his patch to > >> ignore the parent-userns loop, and return false if current_user_ns() != > >> mount_userns, should be right? It'll prevent a child userns from > >> setting a selinux/apparmor entrypoint or POSIX file capabilities on a > >> file and having the parent userns trip over those. > > > > Ok, Andy's fn does the opposite, which will protect the parent userns, > > which is good. > > > > I suspect simply insisting that the user_ns's be equal is still better. > > It fits better with the idea that POSIX caps (and LSM entrypoints) are > > orthogonal to DAC. Kinda. > > We could tighten it even further if we compared *mount* namespaces > instead of user namespaces. That would benefit Docker, non-userns-lxc > and such, too (sigh). > > Actually, I see to good reason to insist on userns equality but not on > mountns equality. If we're not going to trust executables in foreign > namespaces, let's go all the way to distrust executables in all > foreign namespaces, at least unless someone thinks of a reason this > would break existing userspace. I have no doubt there is code out there in production which ends up executing /proc/pid/root/sbin/ifconfig etc. Cause, you know, you really wanna execute whatever garbage is there... Breaking that might be a good thing. -serge -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html