Re: [LSF/MM/BPF TOPIC] Sharing page tables across processes (mshare)

"Christoph Lameter (Ampere)" <cl@xxxxxxxxx> · Tue, 14 May 2024 11:21:37 -0700 (PDT)

1. Amount of memory required for PTEs to map physical pages stays low
even when large number of threads share the same pages since PTEs are
shared across threads.

2. Page protection attributes are shared across threads and a change
of attributes applies immediately to every thread without any overhead
of coordinating protection bit changes across threads.

These advantages no longer apply when unrelated processes share pages.
Large database applications can easily comprise of 1000s of processes
that share 100s of GB of pages. In cases like this, amount of memory
consumed by page tables can exceed the size of actual shared data.
On a database server with 300GB SGA, a system crash was seen with
out-of-memory condition when 1500+ clients tried to share this SGA even
though the system had 512GB of memory. On this server, in the worst case
scenario of all 1500 processes mapping every page from SGA would have
required 878GB+ for just the PTEs.

Ok then use 1Gig pages or higher for a shared mapping of huge pages. I am 
not sure why there is a need for sharing page tables here. I just listened 
to your talk at the LSF/MM and noted some things.

It may be best to follow established shared memory approaches like for 
example implemented already in shmem.

If you want to do it with actually sharing page table semantics then the 
proper implementation using shmem would be maybe to add an additional 
flag. Lets call this O_SHARED_PAGE_TABLE for now.

Then you would do

fd = shmem_open("shared_pagetable_segment", O_CREATE|O_RDWR|O_SHARED_PAGE_TABLE, 0666);

The remaining handling is straightforward and the shmem subsystem already 
provides consistent handling of shared memory segments.

What you would have to do is to sort out the kernel internal problems 
created by sharing page table sections when using SHM vmas. But with that 
there are only limited changes required to special types of vma and the 
shmem subsystem. So the impact on the kernel overall is limited and you 
are following an established method of managing shared memory.

I actually need something like shared page tables also for another in 
kernel page table use case in order to define sections in kernel virtual 
memory that are special for cpus or nodes. Some abstracted functions to 
manage page tables that share pgd,pud,pmd would be good to have in the 
kernel if you dont mind.

But for this use case I'd suggest to use gigabyte shmem mappings and 
be done with it.

https://lwn.net/Articles/375098/