Re: [LSF/MM/BPF TOPIC] HugeTLB generic pagewalk

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 30.01.25 22:36, Oscar Salvador wrote:
Hi,

last year Peter Xu did a presention at LSFM/MM on how to better integrate hugetlb
in the mm core.
There are several reasons we want to do that, but one could say that the two that
matter the most are 1) code duplication and 2) making hugetlb less special.

During the last year several patches that went in that direction were merged e.g:
gup hugetlb unify [1], mprotect for dax PUDs [2], hugetlb into generic unmapping
path [3] to name some.

There was also a concern on how to integrate hugetlb into the generic pagewalk,
getting rid by doing so of a lot of code and have a generic path that could handle
everything.
This was first worked in [4] (very basic draft).

Although a second version is on the works, I would like to present some concerns
I have wrt. that work.

Hi Oscar,


HugeTLB has its own way of dealing with things.
E.g: HugeTLB interprets everything as a pte: huge_pte_uffd_wp, huge_pte_clear_uffd_wp,
huge_pte_dirty, huge_pte_modify, huge_pte_wrprotect etc.

One of the challenges that this raises is that if we want pmd/pud walkers to
be able to make sense of hugetlb stuff, we need to implement pud/pmd
(maybe some pmd we already have because of THP) variants of those.

that's the easy case I'm afraid. The real problem are cont-pte constructs (or worse)
abstracted by hugetlb to be a single unit ("hugetlb pte").

For "ordinary" pages, the cont-pte bit (as on arm64) is nowadays transparently
managed: you can modify any PTE part of the cont-gang and it will just
work as expected, transparently.

Not so with hugetlb, where you have to modify (or even query) the whole thing.

For GUP it was easier, because it was able to grab all information it needed
from the sub-ptes fairly easily, and it doesn't modify any page tabls.


I ran into this problem with folio_walk, and had to document it rather nastily:

 * WARNING: Modifying page table entries in hugetlb VMAs requires a lot of care.
 * For example, PMD page table sharing might require prior unsharing. Also,
 * logical hugetlb entries might span multiple physical page table entries,
 * which *must* be modified in a single operation (set_huge_pte_at(),
 * huge_ptep_set_*, ...). Note that the page table entry stored in @fw might
 * not correspond to the first physical entry of a logical hugetlb entry.

I wanted to use it to rewrite the uprobe code to also handle hugetlb with
less special casing, but that work stalled so far. I think my next attempt would rule
out any non-pmd / non-pud hugetlb pages to make it somewhat simpler.

It all gets weird with things like:

commit 0549e76663730235a10395a7af7ad3d3ce6e2402
Author: Christophe Leroy <christophe.leroy@xxxxxxxxxx>
Date:   Tue Jul 2 15:51:25 2024 +0200

    powerpc/8xx: rework support for 8M pages using contiguous PTE entries
In order to fit better with standard Linux page tables layout, add support
    for 8M pages using contiguous PTE entries in a standard page table.  Page
    tables will then be populated with 1024 similar entries and two PMD
    entries will point to that page table.
The PMD entries also get a flag to tell it is addressing an 8M page, this
    is required for the HW tablewalk assistance.

Where we are walking a PTE table, but actually there is another PTE table we
have to modify in the same go.


Very hard to make that non-hugetlb aware, as it's simply completely different compared
to ordinary page table walking/modifications today.

Maybe there are ideas to tackle that, and I'd be very interested in them.

--
Cheers,

David / dhildenb





[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]

  Powered by Linux