Current state of the kernel appears to be that there are more than 1000 capable() calls and only handful are converted to ns_capable(). Moreover, it probably does not make any sense to convert most of these calls to be namespace aware due to the nature of the physical resources they control, making 'capable()' the right question to ask. Yet, in order to be able to build 'fully functional real device' like containers, user namespaces sometimes need the access to real system resources. Thus, one potential candidate for enabling access to physical resources from the user namespace would be to use namespaces own CAP_SYS_RESOURCE as a magical token for making task capabilities valid for init_ns. Signed-off-by: Janne Karhunen <Janne.Karhunen@xxxxxxxxx> --- kernel/user_namespace.c | 8 ++++++++ security/commoncap.c | 18 ++++++++++++++++-- 2 files changed, 24 insertions(+), 2 deletions(-) diff --git a/kernel/user_namespace.c b/kernel/user_namespace.c index d8c30db..f7281fd 100644 --- a/kernel/user_namespace.c +++ b/kernel/user_namespace.c @@ -43,6 +43,14 @@ static void set_cred_user_ns(struct cred *cred, struct user_namespace *user_ns) key_put(cred->request_key_auth); cred->request_key_auth = NULL; #endif + + /* Since CAP_SYS_RESOURCE is the way out of user_ns, we start off having + * it disabled. + */ + cap_lower (cred->cap_effective, CAP_SYS_RESOURCE); + cap_lower (cred->cap_permitted, CAP_SYS_RESOURCE); + cap_lower (cred->cap_inheritable, CAP_SYS_RESOURCE); + /* tgcred will be cleared in our caller bc CLONE_THREAD won't be set */ cred->user_ns = user_ns; } diff --git a/security/commoncap.c b/security/commoncap.c index c44b6fe..cdacb2d 100644 --- a/security/commoncap.c +++ b/security/commoncap.c @@ -83,9 +83,18 @@ int cap_capable(const struct cred *cred, struct user_namespace *targ_ns, * user namespace's parents. */ for (;;) { - /* Do we have the necessary capabilities? */ + /* If we belong in this ns, do we have the capability? */ if (ns == cred->user_ns) return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM; + else { + /* User_ns asking for rights in init_ns? */ + if (ns == &init_user_ns) { + if (cap_raised(cred->cap_effective, CAP_SYS_RESOURCE)) + return cap_raised(cred->cap_effective, cap) ? 0 : -EPERM; + else + return -EPERM; + } + } /* Have we tried all of the parent namespaces? */ if (ns == &init_user_ns) @@ -481,7 +490,7 @@ int cap_bprm_set_creds(struct linux_binprm *bprm) const struct cred *old = current_cred(); struct cred *new = bprm->cred; bool effective, has_cap = false; - int ret; + int ret, has_res; kuid_t root_uid; effective = false; @@ -501,6 +510,8 @@ int cap_bprm_set_creds(struct linux_binprm *bprm) warn_setuid_and_fcaps_mixed(bprm->filename); goto skip; } + has_res = cap_raised(new->cap_permitted, CAP_SYS_RESOURCE); + /* * To support inheritance of root-permissions and suid-root * executables under compatibility mode, we override the @@ -512,6 +523,9 @@ int cap_bprm_set_creds(struct linux_binprm *bprm) /* pP' = (cap_bset & ~0) | (pI & ~0) */ new->cap_permitted = cap_combine(old->cap_bset, old->cap_inheritable); + + if (!has_res && (old->user_ns != &init_user_ns)) + cap_lower (new->cap_permitted, CAP_SYS_RESOURCE); } if (uid_eq(new->euid, root_uid)) effective = true; -- 1.7.9.5 _______________________________________________ Containers mailing list Containers@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/containers