Michal Koutný <mkoutny@xxxxxxxx> writes: > On Mon, Feb 14, 2022 at 09:10:49AM -0600, "Eric W. Biederman" <ebiederm@xxxxxxxxxxxx> wrote: >> I really like how cleanly this patch seems to be. Unfortunately it is >> wrong. > > It seems [1] so: > > setuid() // RLIMIT_NPROC is fine at this moment > ... fork() > ... > ... fork() > execve() // eh, oh > > This "punishes" the exec'ing task although the cause is elsewhere. > > Michal > > [1] The decoupled setuid()+execve() check can be interpretted both ways. > I understood historically the excess at the setuid() moment is > relevant. I have been digging into this to understand why we are doing the strange things we are doing. Ordinarily for rlimits we are fine with letting things go over limit until we reach a case where we need the limit (which would be fork in the RLIMIT_NPROC case). So things like setrlimit do not check your counts to see if you will be over the limit. The practical problem with fork in the unix model is that you can not change limits or do anything for the new process until it is created (with clone/fork). Making it impossible to set the rlimits and change the user before the new process is created. The result is that in applications like apache when they run cgi scripts (as a different user than the apache process) RLIMIT_NPROC did not work until a check was placed into set*id() as well. As the typical cgi script did not fork it just did it's work and exited. That it was discovered that allowing set*id() to fail was a footgun for privileged processes. And we have the existing system. Which leads me to the starting point that set*id() checking rlimits is a necessary but fundamentally a special case. As long as the original use case works I think there is some latitude in the implementation. Maybe we set a flag and perform all of the checks in exec. Maybe we just send SIGKILL. Maybe we just say it is an ugly wart but it is our ugly wart and comment it and leave it alone. I am leaving that decision to a clean-up patchset.