Re: [PATCH 1/2] fs/exec: allow to unshare a time namespace on vfork+exec

Christian Brauner <brauner@xxxxxxxxxx> · Wed, 15 Jun 2022 10:53:50 +0200

On Wed, Jun 15, 2022 at 10:14:19AM +0200, Florian Weimer wrote:
> * Christian Brauner:
> 
> > For pid namespaces one problem would be that it could end up confusing a
> > process about its own pid. This was a more serious problem when the pid
> > cache was still active in glibc; but fwiw systemd still has a pid cache
> > afair.
> 
> Right.  glibc still has a TID cache, mainly for use with recursive
> mutexes (where we need a 32-bit thread identifier and can't perform a
> system call on every locking operation for performance reasons).
> Assuming that a non-delayed CLONE_NEWPID would also change the TID
> underneath us, we'd have subtly broken recursive mutexes.

Fwiw, you can't call CLONE_NEWPID with CLONE_THREAD. This guarantees
that threads can send signals to each other and all threads within the
same threadgroup can be reached via proc. It'd be awkward if you'd have
a thread whose thread-group leader lives in an ancestor pidns.

Even if you'd make whole threadgroup change pid namespaces immediately
it would mean allocating new TGID and TIDs in the new pid namespaces -
unless they are accidently not already allocated.

> 
> vfork gets away with not updating the TID cache (which is shared with
> the parent process) because the parent process is suspended while the
> new subprocess is still running and has not execve'ed yet.
> 
> Now one could argue that calling unshare automatically means that you
> must not call any glibc functions afterwards (similar to thread-creating
> clone), or at least that you cannot call any functions which are not
> async-signal-safe, but that does not match existing application
> practice.  And I think we actually prefer that file servers call chroot

Yeah, that'd be a rather subtle and risky change for pid namespaces.