On Tue, Oct 31, 2023 at 4:55 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > On 31/10/2023 11:50, Ryan Roberts wrote: > > On 06/10/2023 21:06, David Hildenbrand wrote: > > [...] > >> > >> Change 2: sysfs interface. > >> > >> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hugepage/", I > >> agree. > >> > >> What we expose there and how, is TBD. Again, not a friend of "orders" and > >> bitmaps at all. We can do better if we want to go down that path. > >> > >> Maybe we should take a look at hugetlb, and how they added support for multiple > >> sizes. What *might* make sense could be (depending on which values we actually > >> support!) > >> > >> > >> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/ > >> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/ > >> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/ > >> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/ > >> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/ > >> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/ > >> > >> Each one would contain an "enabled" and "defrag" file. We want something minimal > >> first? Start with the "enabled" option. > >> > >> > >> enabled: always [global] madvise never > >> > >> Initially, we would set it for PMD-sized THP to "global" and for everything else > >> to "never". > > > > Hi David, > > > > I've just started coding this, and it occurs to me that I might need a small > > clarification here; the existing global "enabled" control is used to drive > > decisions for both anonymous memory and (non-shmem) file-backed memory. But the > > proposed new per-size "enabled" is implicitly only controlling anon memory (for > > now). > > > > 1) Is this potentially confusing for the user? Should we rename the per-size > > controls to "anon_enabled"? Or is it preferable to jsut keep it vague for now so > > we can reuse the same control for file-backed memory in future? > > > > 2) The global control will continue to drive the file-backed memory decision > > (for now), even when hugepages-2048kB/enabled != "global"; agreed? > > > > Thanks, > > Ryan > > > > Also, an implementation question: > > hugepage_vma_check() doesn't currently care whether enabled="never" for DAX VMAs > (although it does honour MADV_NOHUGEPAGE and the prctl); It will return true > regardless. Is that by design? It couldn't fathom any reasoning from the commit log: The enabled="never" is for anonymous VMAs, DAX VMAs are typically file VMAs. > > bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags, > bool smaps, bool in_pf, bool enforce_sysfs) > { > if (!vma->vm_mm) /* vdso */ > return false; > > /* > * Explicitly disabled through madvise or prctl, or some > * architectures may disable THP for some mappings, for > * example, s390 kvm. > * */ > if ((vm_flags & VM_NOHUGEPAGE) || > test_bit(MMF_DISABLE_THP, &vma->vm_mm->flags)) > return false; > /* > * If the hardware/firmware marked hugepage support disabled. > */ > if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_UNSUPPORTED)) > return false; > > /* khugepaged doesn't collapse DAX vma, but page fault is fine. */ > if (vma_is_dax(vma)) > return in_pf; <<<<<<<< > > ... > } > >