On Wed, Jul 15, 2015 at 3:35 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: > >> On Wed, Jul 15, 2015 at 2:48 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote: >>> On Wed, Jul 15, 2015 at 02:46:04PM -0500, Seth Forshee wrote: >>>> Capability sets attached to files must be ignored except in the >>>> user namespaces where the mounter is privileged, i.e. s_user_ns >>>> and its descendants. Otherwise a vector exists for gaining >>>> privileges in namespaces where a user is not already privileged. >>>> >>>> Add a new helper function, in_user_ns(), to test whether a user >>>> namespace is the same as or a descendant of another namespace. >>>> Use this helper to determine whether a file's capability set >>>> should be applied to the caps constructed during exec. >>>> >>>> Signed-off-by: Seth Forshee <seth.forshee@xxxxxxxxxxxxx> >>> >>> Acked-by: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx> >>> >>> I think it's an ok behavior, though let's just go over the >>> alternatives. >>> >>> It might actually be ok to simply require that the user_ns be >>> equal. If I unshare a new userns in which a different uid is >>> mapped to root, I may not want file capabilities to be granted >>> to tasks in that ns. (On the other hand, I might be creating >>> a new user_ns specifically to not have a uid 0 mapped into it >>> at all, and only have file capabilities grant privilege) >>> >>> Conversely, if I unshare one user_ns with a MS_SHARED mnt_ns, mount >>> an ext4fs, and then (from the parent shell) unshare another user_ns >>> with the same mapping, intending it to be a "peer" to the first one >>> I'd unshared and be able to use the ext4fs it mounted. This won't >>> work here. That's probably best - the appropriate thing to do was >>> to attach to the existing user_ns. But it could end up being >>> limiting in some special cases, so I'm bringing it up here. >>> >>> Again I think what you have here is the simplest and most sensible >>> choice, so ack. >>> >> >> I think I'm missing something. Why is this separate from mount_may_suid? >> >> I can see why it would make sense to check s_user_ns (or maybe >> s_user_ns *and* the vfsmount namespace) in mount_may_suid, but I don't >> see why we need separate checks. > > So I don't quite understand your concerns that lead to the mnt_may_suid > patch. But in my limited understanding there are two distinct issues. The issue is that we need some kind of control for whether a given operation should trust a given mounted filesystem. There are two kinds of trust: trusting the fs for execve security context (nosuid controls this) and trusting it for LSM access restrictions. I think that, in an unprivileged namespace context, the latter is a bit silly -- the creator of the fs owns it, full stop. I'm talking about the former. In particular, If I unshare everything, mount a fresh FUSE, shove a setuid, fcapped, LSM-labeled thing in it, pass a file descriptor out, and have someone in the root ns execve it, and *pwned*. My suggestion is to use a single function to control this, and I called it mnt_may_suid. We can certainly debate when that function should return true, but I'm unconvinced that the conditions for LSM and for regular setuid should be different. > > 1) What do file capabilities mean on a filesystem mounted with user > namespace privileges. Where the unprivileged user can control what > resides on disk. > > That is what this patch should be about. > > Meaning and restricting those permissions to unprivileged users. I think that file caps should mean what they usually do if the execve caller's userns should trust the file. Otherwise file caps should do nothing. My original idea was that a namespace trusts a vfsmount if the namespace or one of its ancestors created the mount. Doing the same thing but with s_user_ns might also make sense. > > 2) The second issue that I think your mnt_may_suid patch is about seems > to be what to do if a mount winds up in a place we never intended. > > Aka leaks. I don't think any changes to mnt_may_suid are necessary > in that sense. However they may be useful. > > So I think your mnt_may_suid change may be worth having but so far it > seems unnecessary. There's that, too. For one thing, with my mnt_may_suid patch (or a variant that checks the vfsmount and s_user_ns), we could drop the bind-mount nosuid restrictions. If you want to bind-mount an MS_NOSUID mount without MS_NOSUID, then that's fine -- you can't do any harm. > > Which is a long way of saying this patch is fundamentally necessary, > and I am not certain about the mnt_may_suid patch. > > Am I right in understanding it's purpose? Or does this patch actually > succeed in obsoleting it? Other way around. I think that an improved mnt_may_suid patch might render this patch unnecessary. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html