Re: [LSF/MM/BPF TOPIC] Sharing page tables across processes (mshare)

Khalid Aziz <khalid.aziz@xxxxxxxxxx> · Fri, 17 May 2024 15:23:46 -0600

On 5/14/24 12:21, Christoph Lameter (Ampere) wrote:
1. Amount of memory required for PTEs to map physical pages stays low
even when large number of threads share the same pages since PTEs are
shared across threads.

2. Page protection attributes are shared across threads and a change
of attributes applies immediately to every thread without any overhead
of coordinating protection bit changes across threads.

These advantages no longer apply when unrelated processes share pages.
Large database applications can easily comprise of 1000s of processes
that share 100s of GB of pages. In cases like this, amount of memory
consumed by page tables can exceed the size of actual shared data.
On a database server with 300GB SGA, a system crash was seen with
out-of-memory condition when 1500+ clients tried to share this SGA even
though the system had 512GB of memory. On this server, in the worst case
scenario of all 1500 processes mapping every page from SGA would have
required 878GB+ for just the PTEs.

Ok then use 1Gig pages or higher for a shared mapping of huge pages. I am not sure why there is a need for sharing page 
tables here. I just listened to your talk at the LSF/MM and noted some things.

It may be best to follow established shared memory approaches like for example implemented already in shmem.

If you want to do it with actually sharing page table semantics then the proper implementation using shmem would be 
maybe to add an additional flag. Lets call this O_SHARED_PAGE_TABLE for now.

Then you would do

fd = shmem_open("shared_pagetable_segment", O_CREATE|O_RDWR|O_SHARED_PAGE_TABLE, 0666);

The remaining handling is straightforward and the shmem subsystem already provides consistent handling of shared memory 
segments.

What you would have to do is to sort out the kernel internal problems created by sharing page table sections when using 
SHM vmas. But with that there are only limited changes required to special types of vma and the shmem subsystem. So the 
impact on the kernel overall is limited and you are following an established method of managing shared memory.

I actually need something like shared page tables also for another in kernel page table use case in order to define 
sections in kernel virtual memory that are special for cpus or nodes. Some abstracted functions to manage page tables 
that share pgd,pud,pmd would be good to have in the kernel if you dont mind.

But for this use case I'd suggest to use gigabyte shmem mappings and be done with it.

https://lwn.net/Articles/375098/

Hello Christoph,

Thanks for the feedback. Yes, shmem can address this specific case and a solution using shmem with hugepages is in use 
currently. Two issues with that - (1) it addresses only this specific problem and does not address page table sharing in 
a general case which from hearing from many other people is indeed needed, (2) hugepages have to be pre-allocated which 
is not a flexible solution. Even though hugepages can be added at any time, kernel does it on best effort basis and 
latency to get the required number of hugepages can be unpredictable. So a more general solution that does not depend 
upon hugepages can be more useful in the long run and it can help other cases as well, like yours.

Thanks,
Khalid