On Wed, Aug 19, 2020 at 08:32:59AM -0500, Eric W. Biederman wrote:
Matthew Wilcox <willy@xxxxxxxxxxxxx> writes:
On Wed, Aug 19, 2020 at 10:45:56AM +0200, Christian Brauner wrote:
On Wed, Aug 19, 2020 at 09:43:40AM +0200, peterz@xxxxxxxxxxxxx wrote:
On Tue, Aug 18, 2020 at 06:44:47PM +0100, Matthew Wilcox wrote:
On Tue, Aug 18, 2020 at 07:34:00PM +0200, Christian Brauner wrote:
The only remaining function callable outside of kernel/fork.c is
_do_fork(). It doesn't really follow the naming of kernel-internal
syscall helpers as Christoph righly pointed out. Switch all callers and
references to kernel_clone() and remove _do_fork() once and for all.
My only concern is around return type. long, int, pid_t ... can we
choose one and stick to it? pid_t is probably the right return type
within the kernel, despite the return type of clone3(). It'll save us
some work if we ever go through the hassle of growing pid_t beyond 31-bit.
We have at least the futex ABI restricting PID space to 30 bits.
Ok, looking into kernel/futex.c I see
pid_t pid = uval & FUTEX_TID_MASK;
which is probably what this referes to and /proc/sys/kernel/threads-max
is restricted to FUTEX_TID_MASK.
Afaict, that doesn't block switching kernel_clone() to return pid_t. It
can't create anything > FUTEX_TID_MASK anyway without yelling EAGAIN at
userspace. But it means that _if_ we were to change the size of pid_t
we'd likely need a new futex API.
Yes, there would be a lot of work to do to increase the size of pid_t.
I'd just like to not do anything to make that harder _now_. Stick to
using pid_t within the kernel.
Just so people are aware. If you look in include/linux/threads.h you
can see that the maximum value of PID_MAX_LIMIT limits pids to 22 bits.
Further the design decisions of pids keeps us densly using pids. So I
expect it will be a while before we even come close to using 30 bits of
pid space.
Also because it's simply annoying to have to type really large pid
numbers on the shell. Yes yes, that's a very privileged
developer-centric complaint but it matters when you have to do a quick
kill -9. Chromebook users obviously won't care about how large their
pids are for sure.
Tbf, related to discussions last year, systemd now actually raises the
default limit from ~33000 to 4194304. Which seems like an ok compromise.
Christian