Re: [PATCH 3/7] fs: Ignore file caps in mounts from other user namespaces

ebiederm@xxxxxxxxxxxx (Eric W. Biederman) · Thu, 16 Jul 2015 00:44:49 -0500

Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:

> On Wed, Jul 15, 2015 at 10:04 PM, Eric W. Biederman
> <ebiederm@xxxxxxxxxxxx> wrote:
>> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes:
>>
>>>
>>> So here's the semantic question:
>>>
>>> Suppose an unprivileged user (uid 1000) creates a user namespace and a
>>> mount namespace.  They stick a file (owned by uid 1000 as seen by
>>> init_user_ns) in there and mark it setuid root and give it fcaps.
>>
>> To make this make sense I have to ask, is this file on a filesystem
>> where uid 1000 as seen by the init_user_ns stored as uid 1000 on
>> the filesystem?  Or is this uid 0 as seen by the filesystem?
>>
>> I assume this is uid 0 on the filesystem in question or else your
>> unprivileged user would not have sufficient privileges over the
>> filesystem to setup fcaps.
>
> I was thinking uid 0 as seen by the filesystem.  But even if it were
> uid 1000, the unprivileged user can still set whatever mode and xattrs
> they want -- they control the backing store.

Yes.   And that is what I was really asking.  Are we taking about a
filesystem where the user controls the backing store?

>>> Then global root gets an fd to this filesystem.  If they execve the
>>> file directly, then, with my patch 4, it won't act as setuid 1000 and
>>> the fcaps will be ignored.  Even with my patch 4, though, if they bind
>>> mount the fs and execve the file from their bind mount, it will act as
>>> setuid 1000.  Maybe this is odd.  However, with Seth's patch 3, the
>>> fcaps will (correctly) not be honored.
>>
>> With patch 3 you can also think of it as fcaps being honored and you
>> get all the caps in the appropriate user namespace, but since you are
>> not in that user namespace and so don't have a place to store them
>> in struct cred you don't get the file caps.
>>
>> From the philosophy of interpreting the file as defined by the
>> filesystem in principle we could extend struct cred so you actually
>> get the creds just in uid 1000s user namespace, but that is very
>> unlikely to be worth it.
>
> I agree.
>
>>
>>> I tend to thing that, if we're not honoring the fcaps, we shouldn't be
>>> honoring the setuid bit either.  After all, it's really not a trusted
>>> file, even though the only user who could have messed with it really
>>> is the apparent owner.
>>
>> For the file caps we can't honor them because you don't have the bits
>> in struct cred.
>>
>> For setuid we can honor it, and setuid is something that the user
>> namespace allows.
>>
>
> We certainly *can* honor it.  But why should we?  I'd be more
> comfortable with this if the contents of an untrusted filesystem were
> really treated as just data.

In these weird bleed through situtations I don't know that we should.
But extending nosuid protections in this way is a bit like yama
a bit gratuitious stomping don't care cases in the semantics to
make bugs harder to exploit.

>>> And, if we're going to say we don't trust the file and shouldn't honor
>>> setuid or fcaps, then merging all the functionality into mnt_may_suid
>>> could make sense.  Yes, these two things do different things, but they
>>> could hook in to the same place.
>>
>> There are really two separate questions:
>> - Do we trust this filesystem?
>> - Do you have the bits to implement this concept?
>>
>> Even if in this specific context the two questions wind up looking
>> exactly the same. I think it makes a lot of sense to ask the two
>> questions separately.  As future maintenance changes may cause the
>> implementation of the questions to diverge.
>>
>
> Agreed.
>
> Unless someone thinks of an argument to the contrary, I'd say "no, we
> don't trust this filesystem".  I could be convinced otherwise.

But this is context dependent.  From the perspective of the container
we really do want to trust the filesystem.  As the container root set it
up, and if he isn't being hostile likely has a use for setfcaps files
and setuid files and all of the rest.

Perhaps I should phrase it as:
- In this context do we trust the code?   AKA mnt_may_suid?
- What do these bits mean in this context?  (Usually something more complicated).

Which says to me we want both patches 3 and 4 (even if 4 uses s_user_ns)
because 3 is different than 4.

And now I better context switch back to fixing bind mounts.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html