Re: [PATCH RFC v2 0/4] Add support for sharing page tables across processes (Previously mshare)

David Hildenbrand <david@xxxxxxxxxx> · Mon, 31 Jul 2023 18:30:22 +0200

On 31.07.23 18:19, Rongwei Wang wrote:

On 2023/7/31 20:50, David Hildenbrand wrote:
On 31.07.23 14:25, Matthew Wilcox wrote:
On Mon, Jul 31, 2023 at 12:35:00PM +0800, Rongwei Wang wrote:
Hi Matthew

May I ask you another question about mshare under this RFC? I
remember you
said you will redesign the mshare to per-vma not per-mapping
(apologize if
remember wrongly) in last time MM alignment session. And I also
refer to you
to re-code this part in our internal version (based on this RFC). It
seems
that per VMA will can simplify the structure of pgtable sharing, even
doesn't care the different permission of file mapping. these are
advantages
(maybe) that I can imagine. But IMHO, It seems not a strongly reason to
switch per-mapping to per-vma.

And I can't imagine other considerations of upstream. Can you share the
reason why redesigning in a per-vma way, due to integation with
hugetlbfs
pgtable sharing or anonymous page sharing?

It was David who wants to make page table sharing be per-VMA.  I think
he is advocating for the wrong approach.  In any case, I don't have time
to work on mshare and Khalid is on leave until September, so I don't
think anybody is actively working on mshare.

Not that I also don't have any time to look into this, but my comment
essentially was that we should try decoupling page table sharing
(reduce memory consumption, shorter rmap walk) from the
mprotect(PROT_READ) use case.

Hi David, Matthew

Thanks for your reply.

Uh, sorry, I can't imagine the relative between decouping page table
sharing with per-VMA design. And I think mprotect(PROT_READ) has to
modify all sharing page tables of related tasks. It seems that I miss
something about per-VMA from your words.

Assume we do do the page table sharing at mmap time, if the flags are 
right. Let's focus on the most common:

mmap(memfd, PROT_READ | PROT_WRITE, MAP_SHARED)

And doing the same in each and every process.

Having the original design of doing an mprotect(PROT_READ) in each and 
every process is just absolutely inefficient to protect a memfd page.

For that case, my thought was that you actually want to write-protect 
the pages on the memfd level.

So instead of doing mprotect(PROT_READ) in 999 processes, or doing 
mprotect(PROT_READ) on mshare(), you have memfd feature to protect pages 
from any write access -- not using virtual addresses but using an offset 
in the memfd.

Assume such a (badly imagined) memfd_protect(PROT_READ) would make sure 
that:
(1) Any page table mappings of the page are write-protected and
(2) Any write access using the page table mappings trigger write-notify and
(3) Any other access -- e.g., write() -- similarly informs memfd.

Without page table sharing, (1) would have to walk all mappings via the 
rmap. With page table sharing, it would only have to walk one page table.

But the features would be two separate things.

What memfd would do with that write notification (inject a signal, 
something like uffd) would be a different story.

Again, just an idea and maybe complete garbage.

--
Cheers,

David / dhildenb