On Wed, Aug 19, 2020 at 08:32:59AM -0500, Eric W. Biederman wrote: > Matthew Wilcox <willy@xxxxxxxxxxxxx> writes: > > > On Wed, Aug 19, 2020 at 10:45:56AM +0200, Christian Brauner wrote: > >> On Wed, Aug 19, 2020 at 09:43:40AM +0200, peterz@xxxxxxxxxxxxx wrote: > >> > On Tue, Aug 18, 2020 at 06:44:47PM +0100, Matthew Wilcox wrote: > >> > > On Tue, Aug 18, 2020 at 07:34:00PM +0200, Christian Brauner wrote: > >> > > > The only remaining function callable outside of kernel/fork.c is > >> > > > _do_fork(). It doesn't really follow the naming of kernel-internal > >> > > > syscall helpers as Christoph righly pointed out. Switch all callers and > >> > > > references to kernel_clone() and remove _do_fork() once and for all. > >> > > > >> > > My only concern is around return type. long, int, pid_t ... can we > >> > > choose one and stick to it? pid_t is probably the right return type > >> > > within the kernel, despite the return type of clone3(). It'll save us > >> > > some work if we ever go through the hassle of growing pid_t beyond 31-bit. > >> > > >> > We have at least the futex ABI restricting PID space to 30 bits. > >> > >> Ok, looking into kernel/futex.c I see > >> > >> pid_t pid = uval & FUTEX_TID_MASK; > >> > >> which is probably what this referes to and /proc/sys/kernel/threads-max > >> is restricted to FUTEX_TID_MASK. > >> > >> Afaict, that doesn't block switching kernel_clone() to return pid_t. It > >> can't create anything > FUTEX_TID_MASK anyway without yelling EAGAIN at > >> userspace. But it means that _if_ we were to change the size of pid_t > >> we'd likely need a new futex API. > > > > Yes, there would be a lot of work to do to increase the size of pid_t. > > I'd just like to not do anything to make that harder _now_. Stick to > > using pid_t within the kernel. > > Just so people are aware. If you look in include/linux/threads.h you > can see that the maximum value of PID_MAX_LIMIT limits pids to 22 bits. > > Further the design decisions of pids keeps us densly using pids. So I > expect it will be a while before we even come close to using 30 bits of > pid space. Also because it's simply annoying to have to type really large pid numbers on the shell. Yes yes, that's a very privileged developer-centric complaint but it matters when you have to do a quick kill -9. Chromebook users obviously won't care about how large their pids are for sure. Tbf, related to discussions last year, systemd now actually raises the default limit from ~33000 to 4194304. Which seems like an ok compromise. Christian