"Serge E. Hallyn" <serge@xxxxxxxxxx> writes: > Quoting Eric W. Biederman (ebiederm@xxxxxxxxxxxx): >> >> When unsharing a user namespace we reduce our credentials to just what >> can be done in that user namespace. This is a subset of the credentials >> we previously had. Teach commit_creds to recognize this is a subset >> of the credentials we have had before and don't clear the dumpability flag. >> >> This allows an unprivileged program to do: >> unshare(CLONE_NEWUSER); >> fd = open("/proc/self/uid_map", O_RDWR); >> >> Where previously opening the uid_map writable would fail because >> the the task had been made non-dumpable. >> >> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> > > Acked-by: Serge Hallyn <serge.hallyn@xxxxxxxxxxxxx> > >> --- >> kernel/cred.c | 26 +++++++++++++++++++++++++- >> 1 files changed, 25 insertions(+), 1 deletions(-) >> >> diff --git a/kernel/cred.c b/kernel/cred.c >> index 48cea3d..993a7ea41 100644 >> --- a/kernel/cred.c >> +++ b/kernel/cred.c >> @@ -455,6 +455,30 @@ error_put: >> return ret; >> } >> > > Do you think we need to warn that this can only be used for > commit_creds? (i.e. if someone tried ot use this in some > other context, the 'creds are subset of target ns is a child > of current_ns' assumption would be wrong) This function should be a general test valid at any time. Except that I forgot the bit of the test that asks is the original cred the owner of the subset user namespace. I will respin this patch. As a small segway this property that unshare(CLONE_NEWUSER) results in a subset of the capabilities a process already had is a very important property to make it possible to reason about user namespaces. Maintaining this property is the reason behind the choices I made in fixing cap_capable. >> +static bool cred_cap_issubset(const struct cred *set, const struct cred *subset) >> +{ >> + const struct user_namespace *set_ns = set->user_ns; >> + const struct user_namespace *subset_ns = subset->user_ns; >> + >> + /* If the two credentials are in the same user namespace see if >> + * the capabilities of subset are a subset of set. >> + */ >> + if (set_ns == subset_ns) >> + return cap_issubset(subset->cap_permitted, set->cap_permitted); >> + >> + /* The credentials are in a different user namespaces > > This can only happen during setns and CLONE_NEWUSER right? Right. This can only happen during setns, unshare(CLONE_NEWUSER), and possibly during clone. Otherwise we are not changing the user namespace. However for clarity and robustness I don't want the code to rely on that. >> + * therefore one is a subset of the other only if a set is an >> + * ancestor of subset. >> + */ > >> + while (subset_ns != &init_user_ns) { >> + if (set_ns == subset_ns->parent) >> + return true; >> + subset_ns = subset_ns->parent; >> + } >> + >> + return false; >> +} >> + >> /** >> * commit_creds - Install new credentials upon the current task >> * @new: The credentials to be assigned >> @@ -493,7 +517,7 @@ int commit_creds(struct cred *new) >> !gid_eq(old->egid, new->egid) || >> !uid_eq(old->fsuid, new->fsuid) || >> !gid_eq(old->fsgid, new->fsgid) || >> - !cap_issubset(new->cap_permitted, old->cap_permitted)) { >> + !cred_cap_issubset(old, new)) { >> if (task->mm) >> set_dumpable(task->mm, suid_dumpable); >> task->pdeath_signal = 0; >> -- >> 1.7.5.4 _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers