On 06.06.24 20:29, Matthew Wilcox wrote:
One of the things we discussed at LSFMM was unifying the hugetlb and THP page table walkers. I've been looking into it some more recently; I've found a problem and I think a solution. The reason we have a separate hugetlb_entry from pmd_entry and pud_entry is that it has a different locking context. It is called with the hugetlb_vma_lock held for read (nb: this is not the same as the vma lock; see walk_hugetlb_range()). Why do we need this? Because of page table sharing. In a completely separate discussion, I was talking with Khalid about mshare() support for hugetlbfs, and I suggested that we permit hugetlbfs pages to be mapped by a VMA which does not have the VM_HUGETLB flag set. If we do that, the page tables would not be permitted to be shared with other users of that hugetlbfs file. But we want to eliminate support for that anyway, so that's more of a feature than a bug.
I am not sure why hugetlb support in mshare would require that (we don't need partial mappings and all of that to support mshare+hugetlb).
The possible mshare directions I discussed with Khalid at LSF/MM would likely not need that. But I have no idea which mshare design you and Khalid are discussing right now. Maybe it would be a a good idea that the three of us meet to discuss that, if my feedback/opinion could be helpful.
Once we don't use the VM_HUGETLB flag on these VMAs, that opens the door to the other features we want, like mapping individual pages from a hugetlb folio. And we can use the regular page table walkers for these VMAs.
Right, but to me that's a different, long-term project that mshare would maybe not have to rely on.
-- Cheers, David / dhildenb