Hi Michal, On Mon, Feb 07, 2022 at 01:17:55PM +0100, Michal Koutný wrote: > The check is currently against the current->cred but since those are > going to change and we want to check RLIMIT_NPROC condition after the > switch, supply the capability check with the new cred. > But since we're checking new_user being INIT_USER any new cred's > capability-based allowance may be redundant when the check fails and the > alternative solution would be revert of the commit 2863643fb8b9 > ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds") > > Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds") > > Cc: Solar Designer <solar@xxxxxxxxxxxx> > Cc: Christian Brauner <christian.brauner@xxxxxxxxxx> > Signed-off-by: Michal Koutný <mkoutny@xxxxxxxx> > --- > kernel/sys.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/kernel/sys.c b/kernel/sys.c > index 8ea20912103a..48c90dcceff3 100644 > --- a/kernel/sys.c > +++ b/kernel/sys.c > @@ -481,7 +481,8 @@ static int set_user(struct cred *new) > */ > if (ucounts_limit_cmp(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) >= 0 && > new_user != INIT_USER && > - !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) > + !security_capable(new, &init_user_ns, CAP_SYS_RESOURCE, CAP_OPT_NONE) && > + !security_capable(new, &init_user_ns, CAP_SYS_ADMIN, CAP_OPT_NONE)) > current->flags |= PF_NPROC_EXCEEDED; > else > current->flags &= ~PF_NPROC_EXCEEDED; Thank you for working on this and CC'ing me on it. This is related to the discussion Christian and I had in September: https://lore.kernel.org/all/20210913100140.bxqlg47pushoqa3r@wittgenstein/ Christian was going to revert 2863643fb8b9, but apparently that never happened. Back then, I also suggested: "Alternatively, we could postpone the set_user() calls until we're running with the new user's capabilities, but that's an invasive change that's likely to create its own issues." The change you propose above is similar to that, but is more limited and non-invasive. That looks good to me. However, I think you need to drop the negations of the return value from security_capable(). security_capable() returns 0 or -EPERM, while capable() returns a bool, in kernel/capability.c: ns_capable_common(): capable = security_capable(current_cred(), ns, cap, opts); if (capable == 0) { current->flags |= PF_SUPERPRIV; return true; } return false; Also, your change would result in this no longer setting PF_SUPERPRIV. This may be fine, but you could want to document it. On a related note, this comment in security/commoncap.c needs an update: * NOTE WELL: cap_has_capability() cannot be used like the kernel's capable() * and has_capability() functions. That is, it has the reverse semantics: * cap_has_capability() returns 0 when a task has a capability, but the * kernel's capable() and has_capability() returns 1 for this case. cap_has_capability() doesn't actually exist, and perhaps the comment should refer to cap_capable(). Alexander