On Tue, Dec 2, 2014 at 2:48 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: > Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: > >> On Tue, Dec 2, 2014 at 1:26 PM, Eric W. Biederman <ebiederm@xxxxxxxxxxxx> wrote: >>> Andy Lutomirski <luto@xxxxxxxxxxxxxx> writes: >>> >>>> On Tue, Dec 2, 2014 at 12:25 PM, Eric W. Biederman >>>> <ebiederm@xxxxxxxxxxxx> wrote: >>>>> >>>>> Classic unix permission checks have an interesting feature, the group >>>>> permissions for a file can be set to less than the other permissions >>>>> on a file. Occassionally this is used deliberately to give a certain >>>>> group of users fewer permissions than the default. >>>>> >>>>> Overlooking negative groups has resulted in the permission checks for >>>>> setting up a group mapping in a user namespace to be too lax. Tighten >>>>> the permission checks in new_idmap_permitted to ensure that mapping >>>>> uids and gids into user namespaces without privilege will not result >>>>> in new combinations of credentials being available to the users. >>>>> >>>>> When setting mappings without privilege only the creator of the user >>>>> namespace is interesting as all other users that have CAP_SETUID over >>>>> the user namespace will also have CAP_SETUID over the user namespaces >>>>> parent. So the scope of the unprivileged check is reduced to just >>>>> the case where cred->euid is the namespace creator. >>>>> >>>>> For setting a uid mapping without privilege only euid is considered as >>>>> setresuid can set uid, suid and fsuid from euid without privielege >>>>> making any combination of uids possible with user namespaces already >>>>> possible without them. >>>>> >>>>> For now seeting a gid mapping without privilege is removed. The only >>>>> possible set of credentials that would be safe without a gid mapping >>>>> (egid without any supplementary groups) just doesn't happen in practice >>>>> so would simply lead to unused untested code. >>>>> >>>>> setgroups is modified to fail not only when the group ids do not >>>>> map but also when there are no gid mappings at all, preventing >>>>> setgroups(0, NULL) from succeeding when gid mappings have not been >>>>> established. >>>>> >>>>> For a small class of applications this change breaks userspace >>>>> and removes useful functionality. This small class of applications >>>>> includes tools/testing/selftests/mount/unprivileged-remount-test.c >>>>> >>>>> Most of the removed functionality will be added back with the >>>>> addition of a one way knob to disable setgroups. Once setgroups >>>>> is disabled setting the gid_map becomes as safe as setting the uid_map. >>>>> >>>>> For more common applications that set the uid_map and the gid_map with >>>>> privilege this change will have no effect on them. >>>>> >>>>> This should fix CVE-2014-8989. >>>>> >>>>> Cc: stable@xxxxxxxxxxxxxxx >>>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >>>>> --- >>>> >>>>> >>>>> +static inline bool gid_mapping_possible(const struct user_namespace *ns) >>>>> +{ >>>>> + return ns->gid_map.nr_extents != 0; >>>>> +} >>>>> + >>>> >>>> Can you rename this to userns_may_setgroups or something like that? >>>> To me, gid_mapping_possible sounds like you're allowed to map gids, >>>> which sounds like the opposite condition, and it doesn't explain what >>>> the point is. >>> >>> gid_mapping_established? >>> >>> What I mean to be testing if is if from_kgid and make_kgid will work >>> because the gid mappings have been set. >> >> But why do you care whether from_kgid and make_kgid will work? If >> they fail, then they fail. I think that the point is that you're >> checking whether allowing setgroups to drop groups is safe, and that's >> only barely the same condition. > > For all of the system calls to set or change uids and gids except > setgroups it happens to fall out that if there are no mappings set the > system calls fail. That is and was deliberate. However setgroups is > weird because it allows the case of 0 mappings and to maintain the > constraint that it fails when there are no mapping set (just like > everything else) that requires an additional test. > >>> The userns knob for setgroups is a different test and is added >>> in the next patch. And yes we really need both or the knob can >>> start out as on, and we need to provent setgroups(0, NULL) >>> before the user namespace is unshared. >> >> Do you mean before it's mapped? > > Right we need to prevent setgroups(0, NULL) before we set the gid > mapping. Fair enough. If you factor this into a separate inline helper, it might be worth adding a short comment to that effect. It could be as simple as: static inline bool whatever(whatever) { if (mapping is empty) return false; /* setgroups with a nonempty set requires a mapping; make sure that setgroups(0, NULL) does, too. */ ...; } > >>> Although come to think about it probably makes sense to roll those two >>> test into one function and call that inline function from the setgroups >>> implementation. >> >> That's what I think, too. >> >>> >>> Anyway I will think about it and see what I can do to make it easily >>> comprehensible. >>> >>>>> diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c >>>>> index aa312b0dc3ec..51d65b444951 100644 >>>>> --- a/kernel/user_namespace.c >>>>> +++ b/kernel/user_namespace.c >>>>> @@ -812,16 +812,19 @@ static bool new_idmap_permitted(const struct file *file, >>>>> struct user_namespace *ns, int cap_setid, >>>>> struct uid_gid_map *new_map) >>>>> { >>>>> - /* Allow mapping to your own filesystem ids */ >>>>> - if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1)) { >>>>> + const struct cred *cred = file->f_cred; >>>>> + >>>>> + /* Allow a mapping without capabilities when allowing the root >>>>> + * of the user namespace capabilities restricted to that id >>>>> + * will not change the set of credentials available to that >>>>> + * user. >>>>> + */ >>>>> + if ((new_map->nr_extents == 1) && (new_map->extent[0].count == 1) && >>>>> + uid_eq(ns->owner, cred->euid)) { >>>> >>>> What's uid_eq(ns->owner, cred->euid)) for? This should already be covered by: >>> >>> This means that the only user we attempt to set up unprivileged mappings >>> for is the owner of the user namespace. Anyone else should already >>> have capabilities in the parent user namespace or shouldn't be able to >>> set the mapping at all. >>> >>> In practice it is a clarification to make it easier to think about the code. >> >> But why? I don't see why this check is necessary or why it's relevant >> to the current issue. > > My goal in this check is to guarantee that any combination of uids and > gids in struct cred that you can obtain with mappings and a user > namespace you can also obtain without privilege without a user > namespace. > > What limiting euid to ns->owner does is it guarantees that when a user > namespace is created without privilege root doesn't come along and set > the mapping using the unprivileged path. That is confusing to think > about and it is not necessary to support. > > With ns->owner == euid I have the guarantee especially with the gids > that they wind up paired with a uid in struct cred that came from the > same user. Either that or someone set one of the mappings with > privilege. > > With ns->owner == euid I can verify all of these things pretty > trivially. Without that check I don't have a clue how to verify > the pairing between uids and gids in the unprivileged mapping. > > Does that make things clearer? Yes. Thanks. It might pay to try to improve the comment. I understand it with this explanation but I didn't when I just read the comment. > >>>> if (cap_valid(cap_setid) && !file_ns_capable(file, ns, CAP_SYS_ADMIN)) >>>> goto out; >>>> >>>> (except that I don't know why cap_valid(cap_setid) is checked -- this >>>> ought to be enforced for projid_map, too, right?) >>> >>> What to do with projid_map is entirely different discussion. In >>> practice it is dead, and either XFS needs to be fixed to use it >>> or that code needs to be removed. At the time I wrote it XFS >>> did not require any privileges to set project ids. >>> >>>>> u32 id = new_map->extent[0].lower_first; >>>>> if (cap_setid == CAP_SETUID) { >>>>> kuid_t uid = make_kuid(ns->parent, id); >>>>> - if (uid_eq(uid, file->f_cred->fsuid)) >>>>> - return true; >>>>> - } else if (cap_setid == CAP_SETGID) { >>>>> - kgid_t gid = make_kgid(ns->parent, id); >>>>> - if (gid_eq(gid, file->f_cred->fsgid)) >>>>> + if (uid_eq(uid, cred->euid)) >>>> >>>> Why'd you change this from fsuid to euid? >>> >>> Because strangely enough I can set euid to any other uid with >>> setresuid, but the same does not hold with fsuid. >>> >>> So strictly speaking fsuid was actually wrong before. In practice >>> fsuid == euid so I don't think anyone will care. But I want very much >>> to enforce the rule that user namespaces can't give you any credentials >>> you couldn't get otherwise. >> >> Fair enough. Want to split that into a separate patch, then? > > Strictly speaking it is part and parcel of the same thing but it > probably makes sense to split it out and emphasise and explain the > change. Sounds good. Anyway, time to do a combination of Real Work (tm) and dealing with the fact that I found a whole family of vulnerabilities of as-yet-unknown severity in arch/x86 this morning. --Andy _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers