Re: RFC: on adding new CLONE_* flags [WAS Re: [PATCH 0/4] clone: add CLONE_PIDFD]

Florian Weimer <fweimer@xxxxxxxxxx> · Tue, 30 Apr 2019 10:21:20 +0200

* Linus Torvalds:

> Note that vfork() is "exciting" for the compiler in much the same way
> "setjmp/longjmp()" is, because of the shared stack use in the child
> and the parent. It is *very* easy to get this wrong and cause massive
> and subtle memory corruption issues because the parent returns to
> something that has been messed up by the child.

Just using a wrapper around vfork is enough for that, if the return
address is saved on the stack.  It's surprising hard to write a test
case for that, but the corruption is definitely there.

> (In fact, if I recall correctly, the _reason_ we have an explicit
> 'vfork()' entry point rather than using clone() with magic parameters
> was that the lack of arguments meant that you didn't have to
> save/restore any registers in user space, which made the whole stack
> issue simpler. But it's been two decades, so my memory is bitrotting).

That's an interesting point.  Using a callback-style interface avoids
that because you never need to restore the registers in the new
subprocess.  It's still appropriate to use an assembler implementation,
I think, because it will be more obviously correct.

> Also, particularly if you have a big address space, vfork()+execve()
> can be quite a bit faster than fork()+execve(). Linux fork() is pretty
> efficient, but if you have gigabytes of VM space to copy, it's going
> to take time even if you do it fairly well.

vfork is also more benign from a memory accounting perspective.  In some
environments, it's not possible to call fork from a large process
because the accounting assumes (conservatively) that the new process
will dirty a lot of its private memory.

Thanks,
Florian