On Mon, Aug 16, 2021 at 10:06:47AM -0600, Khalid Aziz wrote: > On 8/16/21 9:59 AM, Matthew Wilcox wrote: > > On Mon, Aug 16, 2021 at 05:01:44PM +0200, David Hildenbrand wrote: > > > On 16.08.21 16:40, Matthew Wilcox wrote: > > > > On Mon, Aug 16, 2021 at 04:33:09PM +0200, David Hildenbrand wrote: > > > > > > > I did not follow why we have to play games with MAP_PRIVATE, and having > > > > > > > private anonymous pages shared between processes that don't COW, introducing > > > > > > > new syscalls etc. > > > > > > > > > > > > It's not about SHMEM, it's about file-backed pages on regular > > > > > > filesystems. I don't want to have XFS, ext4 and btrfs all with their > > > > > > own implementations of ARCH_WANT_HUGE_PMD_SHARE. > > > > > > > > > > Let me ask this way: why do we have to play such games with MAP_PRIVATE? > > > > > > > > : Mappings within this address range behave as if they were shared > > > > : between threads, so a write to a MAP_PRIVATE mapping will create a > > > > : page which is shared between all the sharers. > > > > > > > > If so, that's a misunderstanding, because there are no games being played. > > > > What Khalid's saying there is that because the page tables are already > > > > shared for that range of address space, the COW of a MAP_PRIVATE will > > > > create a new page, but that page will be shared between all the sharers. > > > > The second write to a MAP_PRIVATE page (by any of the sharers) will not > > > > create a COW situation. Just like if all the sharers were threads of > > > > the same process. > > > > > > > > > > It actually seems to be just like I understood it. We'll have multiple > > > processes share anonymous pages writable, even though they are not using > > > shared memory. > > > > > > IMHO, sharing page tables to optimize for something kernel-internal (page > > > table consumption) should be completely transparent to user space. Just like > > > ARCH_WANT_HUGE_PMD_SHARE currently is unless I am missing something > > > important. > > > > > > The VM_MAYSHARE check in want_pmd_share()->vma_shareable() makes me assume > > > that we really only optimize for MAP_SHARED right now, never for > > > MAP_PRIVATE. > > > > It's definitely *not* about being transparent to userspace. It's about > > giving userspace new functionality where multiple processes can choose > > to share a portion of their address space with each other. What any > > process changes in that range changes, every sharing process sees. > > mmap(), munmap(), mprotect(), mremap(), everything. > > > > Exactly and to further elaborate, once a process calls mshare() to declare > its intent to share PTEs for a range of address and another process accepts > that sharing by calling mshare() itself, the two (or more) processes have > agreed to share PTEs for that entire address range. A MAP_PRIVATE mapping in > this address range goes against the original intent of sharing and what we > are saying is the original intent of sharing takes precedence in case of > this conflict. I don't know that it's against the original intent ... I think MAP_PRIVATE in this context means "Private to this process and every process sharing this chunk of address space". So a store doesn't go through to the page cache, as it would with MAP_SHARED, but it is visible to the other processes sharing these page tables.