[LSF/MM/BPF TOPIC] Sharing page tables across processes (mshare)

Khalid Aziz <khalid.aziz@xxxxxxxxxx> · Wed, 28 Feb 2024 15:56:37 -0700

Threads of a process share address space and page tables that allows for
two key advantages:

1. Amount of memory required for PTEs to map physical pages stays low
even when large number of threads share the same pages since PTEs are
shared across threads.

2. Page protection attributes are shared across threads and a change
of attributes applies immediately to every thread without any overhead
of coordinating protection bit changes across threads.

These advantages no longer apply when unrelated processes share pages.
Large database applications can easily comprise of 1000s of processes
that share 100s of GB of pages. In cases like this, amount of memory
consumed by page tables can exceed the size of actual shared data.
On a database server with 300GB SGA, a system crash was seen with
out-of-memory condition when 1500+ clients tried to share this SGA even
though the system had 512GB of memory. On this server, in the worst case
scenario of all 1500 processes mapping every page from SGA would have
required 878GB+ for just the PTEs.

I have sent proposals and patches to solve this problem by adding a
mechanism to the kernel for processes to use to opt into sharing
page tables with other processes. We have had discussions on original
proposal and subsequent refinements but we have not converged on a
solution. As systems with multi-TB memory and in-memory databases
are becoming more and more common, this is becoming a significant issue.
An interactive discussion can help us reach a consensus on how to
solve this.

Thanks,
Khalid

References:

https://lore.kernel.org/lkml/cover.1642526745.git.khalid.aziz@xxxxxxxxxx/
https://lore.kernel.org/lkml/cover.1656531090.git.khalid.aziz@xxxxxxxxxx/
https://lore.kernel.org/lkml/cover.1682453344.git.khalid.aziz@xxxxxxxxxx/
https://lore.kernel.org/lkml/4082bc40-a99a-4b54-91e5-a1b55828d202@xxxxxxxxxx/