On Wed, Jun 15, 2022 at 09:53:29AM +0200, Florian Weimer wrote: > * Kees Cook: > > > On Sun, Jun 12, 2022 at 11:07:22PM -0700, Andrei Vagin wrote: > >> Right now, a new process can't be forked in another time namespace > >> if it shares mm with its parent. It is prohibited, because each time > >> namespace has its own vvar page that is mapped into a process address > >> space. > >> > >> When a process calls exec, it gets a new mm and so it could be "legal" > >> to switch time namespace in that case. This was not implemented and > >> now if we want to do this, we need to add another clone flag to not > >> break backward compatibility. > >> > >> We don't have any user requests to switch times on exec except the > >> vfork+exec combination, so there is no reason to add a new clone flag. > >> As for vfork+exec, this should be safe to allow switching timens with > >> the current clone flag. Right now, vfork (CLONE_VFORK | CLONE_VM) fails > >> if a child is forked into another time namespace. With this change, > >> vfork creates a new process in parent's timens, and the following exec > >> does the actual switch to the target time namespace. > > > > This seems like a very special case. None of the other namespaces do > > this, do they? > > I think this started with CLONE_NEWPID, which had a similar delayed > effect with unshare: it happens only after fork, not for the current > process image. I think it's just a limitation of the unshare interface. > Some of the effects simply have to be delayed due to their nature. I tried to give more context in another mail wrt to time namespaces specifically. For pid namespaces one problem would be that it could end up confusing a process about its own pid. This was a more serious problem when the pid cache was still active in glibc; but fwiw systemd still has a pid cache afair. Christian