"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> writes: > Solar Designer <solar@xxxxxxxxxxxx> writes: > >> On Thu, Feb 10, 2022 at 08:13:19PM -0600, Eric W. Biederman wrote: >>> As of commit 2863643fb8b9 ("set_user: add capability check when >>> rlimit(RLIMIT_NPROC) exceeds") setting the flag to see if execve >>> should check RLIMIT_NPROC is buggy, as it tests the capabilites from >>> before the credential change and not aftwards. >>> >>> As of commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of >>> ucounts") examining the rlimit is buggy as cred->ucounts has not yet >>> been properly set in the new credential. >>> >>> Make the code correct and more robust moving the test to see if >>> execve() needs to test RLIMIT_NPROC into commit_creds, and defer all >>> of the rest of the logic into execve() itself. >>> >>> As the flag only indicateds that RLIMIT_NPROC should be checked >>> in execve rename it from PF_NPROC_EXCEEDED to PF_NPROC_CHECK. >>> >>> Cc: stable@xxxxxxxxxxxxxxx >>> Link: https://lkml.kernel.org/r/20220207121800.5079-2-mkoutny@xxxxxxxx >>> Link: https://lkml.kernel.org/r/20220207121800.5079-3-mkoutny@xxxxxxxx >>> Reported-by: Michal Koutn?? <mkoutny@xxxxxxxx> >>> Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds") >>> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts") >>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> >> >> On one hand, this looks good. >> >> On the other, you asked about the Apache httpd suexec scenario in the >> other thread, and here's what this means for it (per my code review): >> >> In that scenario, we have two execve(): first from httpd to suexec, then >> from suexec to the CGI script. Previously, the limit check only >> occurred on the setuid() call by suexec, and its effect was deferred >> until execve() of the script. Now wouldn't it occur on both execve() >> calls, because commit_creds() is also called on execve() (such as in >> case the program is SUID, which suexec actually is)? > > Yes. Moving the check into commit_creds means that the exec after a > suid exec will perform an RLIMIT_NPROC check and could possibly fail. I > would call that a bug. Anything happening in execve should be checked > and handled in execve as execve can fail. > > It also points out that our permission checks for increasing > RLIMIT_NPROC are highly inconsistent. > > One set of permissions in fork(). > Another set of permissions in set*id() and delayed until execve. > No permission checks for the uid change in execve. > > Every time I look into the previous behavior of RLIMIT_NPROC I seem > to find issues. Currently I am planning a posting to linux-api > so sorting out what when RLIMIT_NPROC should be enforced and how > RLIMIT_NPROC gets accounted receives review. I am also planning a > feature branch to deal with the historical goofiness. > > I really like how cleanly this patch seems to be. Unfortunately it is > wrong. Hmm. Maybe not as wrong as I thought. An suid execve does not change the real user. Still a bit wrong from a conservative change point of view because the user namespace can change in setns and CLONE_NEWUSER which will change the accounting now. Which with the ucount rlimit stuff changes where things should be accounted. I am playing with the idea of changing accounting aka (cred->ucounts & cred->user) to only change in fork (aka clone without CLONE_THREAD) and exec. I think that would make maintenance and cleaning all of this up easier. That would also remove serious complications from RLIMIT_SIGPENDING as well. I thought SIGPENDING was only a multi-threaded process issue but from one signal to the next the set*id() family functions can be called. Eric