On Wed, Jun 15, 2022 at 10:14:19AM +0200, Florian Weimer wrote: > * Christian Brauner: > > > For pid namespaces one problem would be that it could end up confusing a > > process about its own pid. This was a more serious problem when the pid > > cache was still active in glibc; but fwiw systemd still has a pid cache > > afair. > > Right. glibc still has a TID cache, mainly for use with recursive > mutexes (where we need a 32-bit thread identifier and can't perform a > system call on every locking operation for performance reasons). > Assuming that a non-delayed CLONE_NEWPID would also change the TID > underneath us, we'd have subtly broken recursive mutexes. Fwiw, you can't call CLONE_NEWPID with CLONE_THREAD. This guarantees that threads can send signals to each other and all threads within the same threadgroup can be reached via proc. It'd be awkward if you'd have a thread whose thread-group leader lives in an ancestor pidns. Even if you'd make whole threadgroup change pid namespaces immediately it would mean allocating new TGID and TIDs in the new pid namespaces - unless they are accidently not already allocated. > > vfork gets away with not updating the TID cache (which is shared with > the parent process) because the parent process is suspended while the > new subprocess is still running and has not execve'ed yet. > > Now one could argue that calling unshare automatically means that you > must not call any glibc functions afterwards (similar to thread-creating > clone), or at least that you cannot call any functions which are not > async-signal-safe, but that does not match existing application > practice. And I think we actually prefer that file servers call chroot Yeah, that'd be a rather subtle and risky change for pid namespaces.