Re: [PATCH 3/8] ucounts: Fix and simplify RLIMIT_NPROC handling during setuid()+execve

"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> · Mon, 14 Feb 2022 11:43:54 -0600

"Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> writes:

> Solar Designer <solar@xxxxxxxxxxxx> writes:
>
>> On Thu, Feb 10, 2022 at 08:13:19PM -0600, Eric W. Biederman wrote:
>>> As of commit 2863643fb8b9 ("set_user: add capability check when
>>> rlimit(RLIMIT_NPROC) exceeds") setting the flag to see if execve
>>> should check RLIMIT_NPROC is buggy, as it tests the capabilites from
>>> before the credential change and not aftwards.
>>> 
>>> As of commit 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of
>>> ucounts") examining the rlimit is buggy as cred->ucounts has not yet
>>> been properly set in the new credential.
>>> 
>>> Make the code correct and more robust moving the test to see if
>>> execve() needs to test RLIMIT_NPROC into commit_creds, and defer all
>>> of the rest of the logic into execve() itself.
>>> 
>>> As the flag only indicateds that RLIMIT_NPROC should be checked
>>> in execve rename it from PF_NPROC_EXCEEDED to PF_NPROC_CHECK.
>>> 
>>> Cc: stable@xxxxxxxxxxxxxxx
>>> Link: https://lkml.kernel.org/r/20220207121800.5079-2-mkoutny@xxxxxxxx
>>> Link: https://lkml.kernel.org/r/20220207121800.5079-3-mkoutny@xxxxxxxx
>>> Reported-by: Michal Koutn?? <mkoutny@xxxxxxxx>
>>> Fixes: 2863643fb8b9 ("set_user: add capability check when rlimit(RLIMIT_NPROC) exceeds")
>>> Fixes: 21d1c5e386bc ("Reimplement RLIMIT_NPROC on top of ucounts")
>>> Signed-off-by: "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx>
>>
>> On one hand, this looks good.
>>
>> On the other, you asked about the Apache httpd suexec scenario in the
>> other thread, and here's what this means for it (per my code review):
>>
>> In that scenario, we have two execve(): first from httpd to suexec, then
>> from suexec to the CGI script.  Previously, the limit check only
>> occurred on the setuid() call by suexec, and its effect was deferred
>> until execve() of the script.  Now wouldn't it occur on both execve()
>> calls, because commit_creds() is also called on execve() (such as in
>> case the program is SUID, which suexec actually is)?
>
> Yes.  Moving the check into commit_creds means that the exec after a
> suid exec will perform an RLIMIT_NPROC check and could possibly fail.  I
> would call that a bug.  Anything happening in execve should be checked
> and handled in execve as execve can fail.
>
> It also points out that our permission checks for increasing
> RLIMIT_NPROC are highly inconsistent.
>
> One set of permissions in fork().
> Another set of permissions in set*id() and delayed until execve.
> No permission checks for the uid change in execve.
>
> Every time I look into the previous behavior of RLIMIT_NPROC I seem
> to find issues.  Currently I am planning a posting to linux-api
> so sorting out what when RLIMIT_NPROC should be enforced and how
> RLIMIT_NPROC gets accounted receives review.  I am also planning a
> feature branch to deal with the historical goofiness.
>
> I really like how cleanly this patch seems to be.  Unfortunately it is
> wrong.

Hmm.  Maybe not as wrong as I thought.  An suid execve does not change
the real user.

Still a bit wrong from a conservative change point of view because the
user namespace can change in setns and CLONE_NEWUSER which will change
the accounting now.  Which with the ucount rlimit stuff changes where
things should be accounted.

I am playing with the idea of changing accounting aka (cred->ucounts &
cred->user) to only change in fork (aka clone without CLONE_THREAD) and
exec.  I think that would make maintenance and  cleaning all of this up
easier.

That would also remove serious complications from RLIMIT_SIGPENDING as
well.

I thought SIGPENDING was only a multi-threaded process issue but from
one signal to the next the set*id() family functions can be called.

Eric