On 09.06.24 22:08, Matthew Wilcox wrote:
On Fri, Jun 07, 2024 at 08:59:17AM +0200, David Hildenbrand wrote:
On 06.06.24 20:29, Matthew Wilcox wrote:
One of the things we discussed at LSFMM was unifying the hugetlb and
THP page table walkers. I've been looking into it some more recently;
I've found a problem and I think a solution.
The reason we have a separate hugetlb_entry from pmd_entry and pud_entry
is that it has a different locking context. It is called with the
hugetlb_vma_lock held for read (nb: this is not the same as the vma
lock; see walk_hugetlb_range()). Why do we need this? Because of page
table sharing.
In a completely separate discussion, I was talking with Khalid about
mshare() support for hugetlbfs, and I suggested that we permit hugetlbfs
pages to be mapped by a VMA which does not have the VM_HUGETLB flag set.
If we do that, the page tables would not be permitted to be shared with
other users of that hugetlbfs file. But we want to eliminate support
for that anyway, so that's more of a feature than a bug.
I am not sure why hugetlb support in mshare would require that (we don't
need partial mappings and all of that to support mshare+hugetlb).
You're absolutely right. My motivation is the other way around. A large
part of "hugetlbfs is special" is tied to the sharing of page tables.
That's why we have the hugetlb_vma_lock. If we're already sharing
page tables with mshare, I assert that it is not necessary to also
share page tables with other hugetlb users. So as part of including
hugetlb support in mshare, we should drop that support, and handle
hugetlb-mapped-with-mshare similarly to THP.
Yes, we should absolutely *not* use hugetlb-page table sharing there!
Regarding THP, I'm not sure how far we should go -- we should make our
lives easier by not allowing partial mappings initially.
Possibly not the mapcount
parts so that we preserve the HVO.
The new mapcount scheme I'll be reviving soon will not work easily with
the existing hugetlb-page table sharing, simply because unrelated MMs
can map/unmap pages in there, and we effectively transfer ownership of
page tables between processes.
As soon as we have one "mm" that owns one set of shared page tables
(i.e., one mshare-mm that owns a set of shared page tables, and to which
we effectively account the mappings in there), it should all just work.
--
Cheers,
David / dhildenb