Re: [PATCH 3/7] fs: Ignore file caps in mounts from other user namespaces

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Wed, 15 Jul 2015 18:19:46 -0700

On Wed, Jul 15, 2015 at 3:35 PM, Eric W. Biederman
<ebiederm@xxxxxxxxxxxx> wrote:
> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
>
>> On Wed, Jul 15, 2015 at 2:48 PM, Serge E. Hallyn <serge@xxxxxxxxxx> wrote:
>>> On Wed, Jul 15, 2015 at 02:46:04PM -0500, Seth Forshee wrote:
>>>> Capability sets attached to files must be ignored except in the
>>>> user namespaces where the mounter is privileged, i.e. s_user_ns
>>>> and its descendants. Otherwise a vector exists for gaining
>>>> privileges in namespaces where a user is not already privileged.
>>>>
>>>> Add a new helper function, in_user_ns(), to test whether a user
>>>> namespace is the same as or a descendant of another namespace.
>>>> Use this helper to determine whether a file's capability set
>>>> should be applied to the caps constructed during exec.
>>>>
>>>> Signed-off-by: Seth Forshee <seth.forshee@xxxxxxxxxxxxx>
>>>
>>> Acked-by: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx>
>>>
>>> I think it's an ok behavior, though let's just go over the
>>> alternatives.
>>>
>>> It might actually be ok to simply require that the user_ns be
>>> equal.  If I unshare a new userns in which a different uid is
>>> mapped to root, I may not want file capabilities to be granted
>>> to tasks in that ns.  (On the other hand, I might be creating
>>> a new user_ns specifically to not have a uid 0 mapped into it
>>> at all, and only have file capabilities grant privilege)
>>>
>>> Conversely, if I unshare one user_ns with a MS_SHARED mnt_ns, mount
>>> an ext4fs, and then (from the parent shell) unshare another user_ns
>>> with the same mapping, intending it to be a "peer" to the first one
>>> I'd unshared and be able to use the ext4fs it mounted.  This won't
>>> work here.  That's probably best - the appropriate thing to do was
>>> to attach to the existing user_ns.  But it could end up being
>>> limiting in some special cases, so I'm bringing it up here.
>>>
>>> Again I think what you have here is the simplest and most sensible
>>> choice, so ack.
>>>
>>
>> I think I'm missing something.  Why is this separate from mount_may_suid?
>>
>> I can see why it would make sense to check s_user_ns (or maybe
>> s_user_ns *and* the vfsmount namespace) in mount_may_suid, but I don't
>> see why we need separate checks.
>
> So I don't quite understand your concerns that lead to the mnt_may_suid
> patch.  But in my limited understanding there are two distinct issues.

The issue is that we need some kind of control for whether a given
operation should trust a given mounted filesystem.  There are two
kinds of trust: trusting the fs for execve security context (nosuid
controls this) and trusting it for LSM access restrictions.  I think
that, in an unprivileged namespace context, the latter is a bit silly
-- the creator of the fs owns it, full stop.  I'm talking about the
former.

In particular, If I unshare everything, mount a fresh FUSE, shove a
setuid, fcapped, LSM-labeled thing in it, pass a file descriptor out,
and have someone in the root ns execve it, and *pwned*.

My suggestion is to use a single function to control this, and I
called it mnt_may_suid.  We can certainly debate when that function
should return true, but I'm unconvinced that the conditions for LSM
and for regular setuid should be different.

>
> 1) What do file capabilities mean on a filesystem mounted with user
>    namespace privileges.  Where the unprivileged user can control what
>    resides on disk.
>
>    That is what this patch should be about.
>
>    Meaning and restricting those permissions to unprivileged users.

I think that file caps should mean what they usually do if the execve
caller's userns should trust the file.  Otherwise file caps should do
nothing.

My original idea was that a namespace trusts a vfsmount if the
namespace or one of its ancestors created the mount.  Doing the same
thing but with s_user_ns might also make sense.

>
> 2) The second issue that I think your mnt_may_suid patch is about seems
>    to be what to do if a mount winds up in a place we never intended.
>
>    Aka leaks.  I don't think any changes to mnt_may_suid are necessary
>    in that sense.  However they may be useful.
>
>    So I think your mnt_may_suid change may be worth having but so far it
>    seems unnecessary.

There's that, too.  For one thing, with my mnt_may_suid patch (or a
variant that checks the vfsmount and s_user_ns), we could drop the
bind-mount nosuid restrictions.  If you want to bind-mount an
MS_NOSUID mount without MS_NOSUID, then that's fine -- you can't do
any harm.

>
> Which is a long way of saying this patch is fundamentally necessary,
> and I am not certain about  the mnt_may_suid patch.
>
> Am I right in understanding it's purpose?  Or does this patch actually
> succeed in obsoleting it?

Other way around.  I think that an improved mnt_may_suid patch might
render this patch unnecessary.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html