On 31/10/2023 12:03, David Hildenbrand wrote: > On 31.10.23 12:55, Ryan Roberts wrote: >> On 31/10/2023 11:50, Ryan Roberts wrote: >>> On 06/10/2023 21:06, David Hildenbrand wrote: >>> [...] >>>> >>>> Change 2: sysfs interface. >>>> >>>> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hugepage/", I >>>> agree. >>>> >>>> What we expose there and how, is TBD. Again, not a friend of "orders" and >>>> bitmaps at all. We can do better if we want to go down that path. >>>> >>>> Maybe we should take a look at hugetlb, and how they added support for multiple >>>> sizes. What *might* make sense could be (depending on which values we actually >>>> support!) >>>> >>>> >>>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/ >>>> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/ >>>> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/ >>>> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/ >>>> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/ >>>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/ >>>> >>>> Each one would contain an "enabled" and "defrag" file. We want something >>>> minimal >>>> first? Start with the "enabled" option. >>>> >>>> >>>> enabled: always [global] madvise never >>>> >>>> Initially, we would set it for PMD-sized THP to "global" and for everything >>>> else >>>> to "never". >>> >>> Hi David, >>> >>> I've just started coding this, and it occurs to me that I might need a small >>> clarification here; the existing global "enabled" control is used to drive >>> decisions for both anonymous memory and (non-shmem) file-backed memory. But the >>> proposed new per-size "enabled" is implicitly only controlling anon memory (for >>> now). >>> >>> 1) Is this potentially confusing for the user? Should we rename the per-size >>> controls to "anon_enabled"? Or is it preferable to jsut keep it vague for now so >>> we can reuse the same control for file-backed memory in future? >>> >>> 2) The global control will continue to drive the file-backed memory decision >>> (for now), even when hugepages-2048kB/enabled != "global"; agreed? >>> >>> Thanks, >>> Ryan >>> >> >> Also, an implementation question: >> >> hugepage_vma_check() doesn't currently care whether enabled="never" for DAX VMAs >> (although it does honour MADV_NOHUGEPAGE and the prctl); It will return true >> regardless. Is that by design? It couldn't fathom any reasoning from the >> commit log: > > The whole DAX "hugepage" and THP mixup is just plain confusing. We're simply > using PUD/PMD mappings of DAX memory, and PMD/PTE- remap when required (VMA > split I assume, COW). > > It doesn't result in any memory waste, so who really cares how it's mapped? > Apparently we want individual processes to just disable PMD/PUD mappings of DAX > using the prctl and madvise. Maybe there are good reasons. > > Looks like a design decision, probably some legacy leftovers. OK, I'll ensure I keep this behaviour. Thanks! >