On Mon, Jul 31, 2023 at 06:48:47PM +0200, David Hildenbrand wrote: > On 31.07.23 18:38, Matthew Wilcox wrote: > > On Mon, Jul 31, 2023 at 06:30:22PM +0200, David Hildenbrand wrote: > > > Assume we do do the page table sharing at mmap time, if the flags are right. > > > Let's focus on the most common: > > > > > > mmap(memfd, PROT_READ | PROT_WRITE, MAP_SHARED) > > > > > > And doing the same in each and every process. > > > > That may be the most common in your usage, but for a database, you're > > looking at two usage scenarios. Postgres calls mmap() on the database > > file itself so that all processes share the kernel page cache. > > Some Commercial Databases call mmap() on a hugetlbfs file so that all > > processes share the same userspace buffer cache. Other Commecial > > Databases call shmget() / shmat() with SHM_HUGETLB for the exact > > same reason. > > I remember you said that postgres might be looking into using shmem as well, > maybe I am wrong. No, I said that postgres was also interested in sharing page tables. I don't think they have any use for shmem. > memfd/hugetlb/shmem could all be handled alike, just "arbitrary filesystems" > would require more work. But arbitrary filesystems was one of the origin use cases; where the database is stored on a persistent memory filesystem, and neither the kernel nor userspace has a cache. The Postgres & Commercial Database use-cases collapse into the same case, and we want to mmap the files directly and share the page tables. > > This is why I proposed mshare(). Anyone can use it for anything. > > We have such a diverse set of users who want to do stuff with shared > > page tables that we should not be tying it to memfd or any other > > filesystem. Not to mention that it's more flexible; you can map > > individual 4kB files into it and still get page table sharing. > > That's not what the current proposal does, or am I wrong? I think you're wrong, but I haven't had time to read the latest patches. > Also, I'm curious, is that a real requirement in the database world? I don't know. It's definitely an advantage that falls out of the design of mshare.