On 2023/7/31 20:50, David Hildenbrand wrote:
On 31.07.23 14:25, Matthew Wilcox wrote:
On Mon, Jul 31, 2023 at 12:35:00PM +0800, Rongwei Wang wrote:
Hi Matthew
May I ask you another question about mshare under this RFC? I
remember you
said you will redesign the mshare to per-vma not per-mapping
(apologize if
remember wrongly) in last time MM alignment session. And I also
refer to you
to re-code this part in our internal version (based on this RFC). It
seems
that per VMA will can simplify the structure of pgtable sharing, even
doesn't care the different permission of file mapping. these are
advantages
(maybe) that I can imagine. But IMHO, It seems not a strongly reason to
switch per-mapping to per-vma.
And I can't imagine other considerations of upstream. Can you share the
reason why redesigning in a per-vma way, due to integation with
hugetlbfs
pgtable sharing or anonymous page sharing?
It was David who wants to make page table sharing be per-VMA. I think
he is advocating for the wrong approach. In any case, I don't have time
to work on mshare and Khalid is on leave until September, so I don't
think anybody is actively working on mshare.
Not that I also don't have any time to look into this, but my comment
essentially was that we should try decoupling page table sharing
(reduce memory consumption, shorter rmap walk) from the
mprotect(PROT_READ) use case.
Hi David, Matthew
Thanks for your reply.
Uh, sorry, I can't imagine the relative between decouping page table
sharing with per-VMA design. And I think mprotect(PROT_READ) has to
modify all sharing page tables of related tasks. It seems that I miss
something about per-VMA from your words.
BTW, I can imagine a corner case to show the defect (maybe) of
per-mapping. If we create a range of page table sharing by
memfd_create(), and a child also own this range of page table sharing.
But this child process can not create page table sharing base on the
same fd after mumap() this range (same mapping but different vma area).
Of course, per-VMA is better choice that can continue to create page
table sharing base on original fd. That's because new mm struct created
in this way. I guess that is a type of decoupling you said?
It's just corner case. I am not sure how important it is.
For page table sharing I was wondering whether there could be ways to
just have that done semi-automatically. Similar to how it's done for
hugetlb. There are some clear limitations: mappings < PMD_SIZE won't
be able to benefit.
It's still unclear whether that is a real limitation. Some use cases
were raised (put all user space library mappings into a shared area),
but I realized that these conflict with MAP_PRIVATE requirements of
such areas. Maybe I'm wrong and this is easily resolved.
At least it's not the primary use case that was raised. For the
primary use cases (VMs, databases) that map huge areas, it might not
be a limitation.
Regarding mprotect(PROT_READ), my point was that mprotect() is most
probably the wrong tool to use (especially, due to signal handling).
Instead, I was suggesting having a way to essentially protect pages in
a shmem file -- and get notified whenever wants to write to such a
page either via the page tables or via write() and friends. We do have
the write-notify infrastructure for filesystems in place that we might
extend/reuse.
I am poor in filesystem. The write-notify sounds a good idea. Maybe I
need some times to digest this.
That mechanism could benefit from shared page tables by having to do
less rmap walks.
Again, I don't have time to look into that (just like everybody else
as it appears) and might miss something important. Just sharing my
thoughts that I raised in the call.
Your words are very helpful to me. I try to design our internal version
about this feature in a right way.
Thanks again.