Solar Designer <solar@xxxxxxxxxxxx> writes: > Hi Michal, > > On Mon, Feb 07, 2022 at 01:17:55PM +0100, Michal Koutný wrote: >> The check is currently against the current->cred but since those are >> going to change and we want to check RLIMIT_NPROC condition after the >> switch, supply the capability check with the new cred. >> But since we're checking new_user being INIT_USER any new cred's >> capability-based allowance may be redundant when the check fails and the >> alternative solution would be revert of the commit 2863643fb8b9 >> ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds") >> >> Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds") >> >> Cc: Solar Designer <solar@xxxxxxxxxxxx> >> Cc: Christian Brauner <christian.brauner@xxxxxxxxxx> >> Signed-off-by: Michal Koutný <mkoutny@xxxxxxxx> >> --- >> kernel/sys.c | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/kernel/sys.c b/kernel/sys.c >> index 8ea20912103a..48c90dcceff3 100644 >> --- a/kernel/sys.c >> +++ b/kernel/sys.c >> @@ -481,7 +481,8 @@ static int set_user(struct cred *new) >> */ >> if (ucounts_limit_cmp(new->ucounts, UCOUNT_RLIMIT_NPROC, rlimit(RLIMIT_NPROC)) >= 0 && >> new_user != INIT_USER && >> - !capable(CAP_SYS_RESOURCE) && !capable(CAP_SYS_ADMIN)) >> + !security_capable(new, &init_user_ns, CAP_SYS_RESOURCE, CAP_OPT_NONE) && >> + !security_capable(new, &init_user_ns, CAP_SYS_ADMIN, CAP_OPT_NONE)) >> current->flags |= PF_NPROC_EXCEEDED; >> else >> current->flags &= ~PF_NPROC_EXCEEDED; > > Thank you for working on this and CC'ing me on it. This is related to > the discussion Christian and I had in September: > > https://lore.kernel.org/all/20210913100140.bxqlg47pushoqa3r@wittgenstein/ > > Christian was going to revert 2863643fb8b9, but apparently that never > happened. Back then, I also suggested: > > "Alternatively, we could postpone the set_user() calls until we're > running with the new user's capabilities, but that's an invasive change > that's likely to create its own issues." I really think we need to do something like that. Probably just set a flag in commit_creds and test later. I was working on fixes that looked cleaner and I just recently realized that the test in fork is almost as bad. The function has_capability can be used but the same kind of problems exist. I thought I was very quickly going to have patches to post but I need to redo everything now that I have noticed the issue in fork, so it will be a day or so. Eric > The change you propose above is similar to that, but is more limited and > non-invasive. That looks good to me. > > However, I think you need to drop the negations of the return value from > security_capable(). security_capable() returns 0 or -EPERM, while > capable() returns a bool, in kernel/capability.c: ns_capable_common(): > > capable = security_capable(current_cred(), ns, cap, opts); > if (capable == 0) { > current->flags |= PF_SUPERPRIV; > return true; > } > return false; > > Also, your change would result in this no longer setting PF_SUPERPRIV. > This may be fine, but you could want to document it. > > On a related note, this comment in security/commoncap.c needs an update: > > * NOTE WELL: cap_has_capability() cannot be used like the kernel's capable() > * and has_capability() functions. That is, it has the reverse semantics: > * cap_has_capability() returns 0 when a task has a capability, but the > * kernel's capable() and has_capability() returns 1 for this case. > > cap_has_capability() doesn't actually exist, and perhaps the comment > should refer to cap_capable(). > > Alexander