pid_max is sort of a legacy limit (its value and partially the concept too, given the existence of pids cgroup controller). It is tempting to make the pid_max value part of a pid namespace to provide compat environment for 32-bit applications [1]. On the other hand, it provides yet another mechanism for limitation of task count. Even without namespacing of pid_max value, the configuration of conscious limit is confusing for users [2]. This series builds upon the idea of restricting the number (amount) of tasks by pids controller and ensuring that number (pid) never exceeds the amount of tasks. This would not currently work out of the box because next-fit pid allocation would continue to assign numbers (pids) higher than the actual amount (there would be gaps in the lower range of the interval). The patch 2/2 implements this idea by extending semantics of ns_last_pid knob to allow first-fit numbering. (The implementation has clumsy ifdefery, which can might be dropped since it's too x86-centric.) The patch 1/2 is a mere revert to simplify pid_max to one global limit only. (I pruned Cc: list from scripts/get_maintainer.pl for better focus, feel free to bounce as necessary.) [1] https://lore.kernel.org/r/20241122132459.135120-1-aleksandr.mikhalitsyn@xxxxxxxxxxxxx/ [2] https://lore.kernel.org/r/bnxhqrq7tip6jl2hu6jsvxxogdfii7ugmafbhgsogovrchxfyp@kagotkztqurt/ Michal Koutný (2): Revert "pid: allow pid_max to be set per pid namespace" pid: Optional first-fit pid allocation Documentation/admin-guide/sysctl/kernel.rst | 2 + include/linux/pid.h | 3 + include/linux/pid_namespace.h | 11 +- kernel/pid.c | 137 +++----------------- kernel/pid_namespace.c | 71 +++++----- kernel/sysctl.c | 9 ++ kernel/trace/pid_list.c | 2 +- kernel/trace/trace.h | 2 + kernel/trace/trace_sched_switch.c | 2 +- 9 files changed, 70 insertions(+), 169 deletions(-) base-commit: 334426094588f8179fe175a09ecc887ff0c75758 -- 2.48.1