On 16.08.21 17:59, Matthew Wilcox wrote:
On Mon, Aug 16, 2021 at 05:01:44PM +0200, David Hildenbrand wrote:
On 16.08.21 16:40, Matthew Wilcox wrote:
On Mon, Aug 16, 2021 at 04:33:09PM +0200, David Hildenbrand wrote:
I did not follow why we have to play games with MAP_PRIVATE, and having
private anonymous pages shared between processes that don't COW, introducing
new syscalls etc.
It's not about SHMEM, it's about file-backed pages on regular
filesystems. I don't want to have XFS, ext4 and btrfs all with their
own implementations of ARCH_WANT_HUGE_PMD_SHARE.
Let me ask this way: why do we have to play such games with MAP_PRIVATE?
: Mappings within this address range behave as if they were shared
: between threads, so a write to a MAP_PRIVATE mapping will create a
: page which is shared between all the sharers.
If so, that's a misunderstanding, because there are no games being played.
What Khalid's saying there is that because the page tables are already
shared for that range of address space, the COW of a MAP_PRIVATE will
create a new page, but that page will be shared between all the sharers.
The second write to a MAP_PRIVATE page (by any of the sharers) will not
create a COW situation. Just like if all the sharers were threads of
the same process.
It actually seems to be just like I understood it. We'll have multiple
processes share anonymous pages writable, even though they are not using
shared memory.
IMHO, sharing page tables to optimize for something kernel-internal (page
table consumption) should be completely transparent to user space. Just like
ARCH_WANT_HUGE_PMD_SHARE currently is unless I am missing something
important.
The VM_MAYSHARE check in want_pmd_share()->vma_shareable() makes me assume
that we really only optimize for MAP_SHARED right now, never for
MAP_PRIVATE.
It's definitely *not* about being transparent to userspace. It's about
giving userspace new functionality where multiple processes can choose
to share a portion of their address space with each other. What any
process changes in that range changes, every sharing process sees.
mmap(), munmap(), mprotect(), mremap(), everything.
Oh okay, so it's actually much more complicated and complex than I
thought. Thanks for clarifying that! I recall virtiofsd had similar
requirements for sharing memory with the QEMU main process, I might be
wrong.
"existing shared memory area" and your initial page table example made
me assume that we are simply dealing with sharing page tables of MAP_SHARED.
It's actually something like a VMA container that you share between
processes. And whatever VMAs are currently inside that VMA container is
mirrored to other processes. I assume sharing page tables could actually
be an implementation detail, especially when keeping MAP_PRIVATE
(confusing in that context!) and other features that will give you
surprises (uffd) out of the picture.
--
Thanks,
David / dhildenb