On Thu, Jun 06, 2024 at 05:49:30PM -0400, Peter Xu wrote: > On Thu, Jun 06, 2024 at 07:29:22PM +0100, Matthew Wilcox wrote: > > The reason we have a separate hugetlb_entry from pmd_entry and pud_entry > > is that it has a different locking context. It is called with the > > hugetlb_vma_lock held for read (nb: this is not the same as the vma > > lock; see walk_hugetlb_range()). Why do we need this? Because of page > > table sharing. > > Just to quickly comment on this one: I think it's more than the per-vma > lock. Oscar is actually working together with me (we had plenty of > discussions but so far all offlist...), and the lock context is as simple > as this after refactor for hugetlb_entry() path: > > https://github.com/leberus/linux/commit/88e56c1ecaf8c64ba9165aeba74335bdc15d1b56 Yes, I reached out to Peter after LSFMM because I was highly interested in helping out here. We agreed that I would take pagewalk part, and I already do have some patches on the works [1][2] that are based on a patchset that I have been reviewing that removes hugepd on powerpc [3]. Ideally we should remove the exclusive use of 'pte' from hugetlb (unless it is CONTPTE) and have it using pud/pmd where needed. E.g: if we look at huge_ptep_get version from s390, which is the most special one I would say: huge_ptep_get()->__rste_to_pte or they way around (__pte_to_rste) what it does is it tries to convert a pud/pmd entry into a pte or viceversa, since hugetlb "can" only work with that, and so you have all this castings back and forth all spread over. I started first merging all .hugetlb_entry functions into the .pmd_entrys (not done yet, half-way through) and creating .pud_entry because we will need them since hugetlb can be PUD-mapped, unlike THP (well, yes, devmp but most walkers do not care about it so they did not create a .pud_entry). Then I will be running some tests on x86_64/arm64/pp64le and s390(not sure if I will be able to grab one but let us see), and then I will post a patchset as RFC to gather some feedback. [1] https://github.com/leberus/linux/tree/hugetlb-pagewalk-v2 [2] Do not stare too close as they are a very WIP, and ignore the last 4 commits as they are half-done. [3] https://patchwork.kernel.org/project/linux-mm/cover/cover.1716815901.git.christophe.leroy@xxxxxxxxxx/ -- Oscar Salvador SUSE Labs