Hi,
I've looked at Andrei's patch[1] that permitted vfork() after
unshare(CLONE_NEWTIME) and noticed a couple of odd things that I'd like
to point out.
/*
* If the new process will be in a different time namespace
* do not allow it to share VM or a thread group with the forking
task.
+ *
+ * On vfork, the child process enters the target time namespace only
+ * after exec.
*/
- if (clone_flags & (CLONE_THREAD | CLONE_VM)) {
+ if ((clone_flags & (CLONE_VM | CLONE_VFORK)) == CLONE_VM) {
if (nsp->time_ns != nsp->time_ns_for_children)
return ERR_PTR(-EINVAL);
}
This change permits not only a normal vfork(), but also
clone(CLONE_VM|CLONE_VFORK|CLONE_SIGHAND|CLONE_THREAD). I'm not sure
whether it can cause real harm, but it's pretty inconsistent to forbid
creation of normal threads after unshare(CLONE_NEWTIME), but permit such
weird ones, so maybe the check should be strengthened.
Also, if such a thread execs, no time namespace switch will happen
because it's vfork_done field will be cleared when its creator (a
sibling thread) is killed by de_thread().
+ vfork = !!tsk->vfork_done;
old_mm = current->mm;
exec_mm_release(tsk, old_mm);
if (old_mm)
@@ -1030,6 +1033,10 @@ static int exec_mmap(struct mm_struct *mm)
tsk->mm->vmacache_seqnum = 0;
vmacache_flush(tsk);
task_unlock(tsk);
+
+ if (vfork)
+ timens_on_fork(tsk->nsproxy, tsk);
+
Similarly, even after a normal vfork(), time namespace switch could be
silently skipped if the parent dies before "tsk->vfork_done" is read.
Again, I don't know whether anybody cares, but this behavior seems
non-obvious and probably unintended to me.
Thanks,
Alexey
[1]
https://lore.kernel.org/all/20220613060723.197407-1-avagin@xxxxxxxxx/