Hi Rasmus, On 2/7/19 8:31 AM, Rasmus Villemoes wrote: > These (14-19, if I'm reading them right) seems to add quite a lot of > complexity and fragility to the build, and other architectures would > probably have to add something similar to their vdso builds. > > I'm wondering why not make the rule be that a timens takes effect on > next execve? I believe, it would make setns() syscall much tricker than wanted: At this moment the only exception is pidns which changes ns of the child and not the process-callee. If exec() would be required to join timens - it may be a challenging problem for container systems: in order to enter it one needs to exec("/proc/self/exe") and add some new arguments/options. Furthermore, it seems to me that to enter container with this semantics, one needs to enter timens before entering mountns. IOW, I believe, this would move complexity from kernel build time to userspace ABI. And I guess, it would require much more logic to re-create possibly nested namespaces hierarchy. Rather I've considered using some kind of dynamic patching on vdso_init(): o static_branch - it would add some nops to !timens vdso o something new like static_retpoline which would put RET over call to clk_to_ns(), shouldn't be a rocket since. But in my point of view, if something can be done in compile time instead of patching code dynamically - than it reduces the complexity (lesser depends on what compiler/toolchain does). Thanks, Dmitry