On Tue, Aug 30, 2022 at 12:49 PM Alexey Izbyshev <izbyshev@xxxxxxxxx> wrote: > > Hi, > > I've looked at Andrei's patch[1] that permitted vfork() after > unshare(CLONE_NEWTIME) and noticed a couple of odd things that I'd like > to point out. > > /* > * If the new process will be in a different time namespace > * do not allow it to share VM or a thread group with the forking > task. > + * > + * On vfork, the child process enters the target time namespace only > + * after exec. > */ > - if (clone_flags & (CLONE_THREAD | CLONE_VM)) { > + if ((clone_flags & (CLONE_VM | CLONE_VFORK)) == CLONE_VM) { > if (nsp->time_ns != nsp->time_ns_for_children) > return ERR_PTR(-EINVAL); > } > > This change permits not only a normal vfork(), but also > clone(CLONE_VM|CLONE_VFORK|CLONE_SIGHAND|CLONE_THREAD). I'm not sure > whether it can cause real harm, but it's pretty inconsistent to forbid > creation of normal threads after unshare(CLONE_NEWTIME), but permit such > weird ones, so maybe the check should be strengthened. Good catch. I was not aware that CLONE_VFORK is allowed to be used with CLONE_THREAD. I will send a fix. Thanks. > > Also, if such a thread execs, no time namespace switch will happen > because it's vfork_done field will be cleared when its creator (a > sibling thread) is killed by de_thread(). > > + vfork = !!tsk->vfork_done; > old_mm = current->mm; > exec_mm_release(tsk, old_mm); > if (old_mm) > @@ -1030,6 +1033,10 @@ static int exec_mmap(struct mm_struct *mm) > tsk->mm->vmacache_seqnum = 0; > vmacache_flush(tsk); > task_unlock(tsk); > + > + if (vfork) > + timens_on_fork(tsk->nsproxy, tsk); > + > > Similarly, even after a normal vfork(), time namespace switch could be > silently skipped if the parent dies before "tsk->vfork_done" is read. > Again, I don't know whether anybody cares, but this behavior seems > non-obvious and probably unintended to me. This is the more interesting case. I will try to find out how we can handle it properly. Thanks, Andrei