On 5/21/22 13:12, Matthew Wilcox wrote:
On Sat, May 21, 2022 at 06:07:27PM +0200, David Hildenbrand wrote:
I'm missing the most important point: why do we care and why should we
care to make our COW/fork implementation even more complicated?
Yes, we might save some page tables and we might reduce the fork() time,
however, which specific workload really benefits from this and why do we
really care about that workload? Without even hearing about an example
user in this cover letter (unless I missed it), I naturally wonder about
relevance in practice.
As I get older (and crankier), I get less convinced that fork() is
really the right solution for implementing system(). I feel that a
better model is to create a process with zero threads, but have an fd
to it. Then manipulate the child process through its fd (eg mmap
ld.so, open new fds in that process's fdtable, etc). Closing the fd
launches a new thread in the process (ensuring nobody has an fd to a
running process, particularly one which is setuid).
Heh, I learned serious programming on Windows, and I thought fork() was
entertaining, cool, and a bad idea when I first learned about it. (I
admit I did think the fact that POSIX fork and exec had many fewer
arguments than CreateProcess was a good thing.) Don't even get me
started on setuid -- if I had my way, distros would set NO_NEW_PRIVS on
boot for the entire system.
I can see a rather different use for this type of shared-pagetable
technology, though: monstrous MAP_SHARED mappings. For database and
some VM users, multiple processes will map the same file. If there was
a way to ensure appropriate alignment (or at least encourage it) and a
way to handle mappings that don't cover the whole file, then having
multiple mappings share the same page tables could be a decent
efficiently gain. This doesn't even need COW -- it's "just" pagetable
sharing.
It's probably a pipe dream, but I like to imagine that the bookkeeping
that would enable this would also enable a much less ad-hoc concept of
who owns which pagetable page. Then things like x86's KPTI LDT mappings
would be less disgusting under the hood.
Android would probably like a similar feature for MAP_ANONYMOUS or that
could otherwise enable Zygote to share paging structures (ideally
without fork(), although that's my dream, not necessarily Android's).
This is more complex, since COW is involved. Also possibly less
valuable -- possibly the entire benefit and then some would be achieved
by using huge pages for Zygote and arranging for CoWing one normal-size
page out of a hugepage COW mapping to only COW the one page.
--Andy