On Wed, Nov 1, 2023 at 7:02 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > On 31/10/2023 18:29, Yang Shi wrote: > > On Tue, Oct 31, 2023 at 4:55 AM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > >> > >> On 31/10/2023 11:50, Ryan Roberts wrote: > >>> On 06/10/2023 21:06, David Hildenbrand wrote: > >>> [...] > >>>> > >>>> Change 2: sysfs interface. > >>>> > >>>> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hugepage/", I > >>>> agree. > >>>> > >>>> What we expose there and how, is TBD. Again, not a friend of "orders" and > >>>> bitmaps at all. We can do better if we want to go down that path. > >>>> > >>>> Maybe we should take a look at hugetlb, and how they added support for multiple > >>>> sizes. What *might* make sense could be (depending on which values we actually > >>>> support!) > >>>> > >>>> > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/ > >>>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/ > >>>> > >>>> Each one would contain an "enabled" and "defrag" file. We want something minimal > >>>> first? Start with the "enabled" option. > >>>> > >>>> > >>>> enabled: always [global] madvise never > >>>> > >>>> Initially, we would set it for PMD-sized THP to "global" and for everything else > >>>> to "never". > >>> > >>> Hi David, > >>> > >>> I've just started coding this, and it occurs to me that I might need a small > >>> clarification here; the existing global "enabled" control is used to drive > >>> decisions for both anonymous memory and (non-shmem) file-backed memory. But the > >>> proposed new per-size "enabled" is implicitly only controlling anon memory (for > >>> now). > >>> > >>> 1) Is this potentially confusing for the user? Should we rename the per-size > >>> controls to "anon_enabled"? Or is it preferable to jsut keep it vague for now so > >>> we can reuse the same control for file-backed memory in future? > >>> > >>> 2) The global control will continue to drive the file-backed memory decision > >>> (for now), even when hugepages-2048kB/enabled != "global"; agreed? > >>> > >>> Thanks, > >>> Ryan > >>> > >> > >> Also, an implementation question: > >> > >> hugepage_vma_check() doesn't currently care whether enabled="never" for DAX VMAs > >> (although it does honour MADV_NOHUGEPAGE and the prctl); It will return true > >> regardless. Is that by design? It couldn't fathom any reasoning from the commit log: > > > > The enabled="never" is for anonymous VMAs, DAX VMAs are typically file VMAs. > > That's not quite true; enabled="never" is honoured for non-DAX/non-shmem file > VMAs (for collapse via CONFIG_READ_ONLY_THP_FOR_FS and more recently for When implementing READ_ONLY_THP_FOR_FS the file THP just can be collapsed by khugepaged, but khugepaged is started iff enabled != "never". So READ_ONLY_THP_FOR_FS has to honor it. Unfortunately there are some confusing exceptions... But anyway DAX is not the same class. > anything that implements huge_fault() - see > 7a81751fcdeb833acc858e59082688e3020bfe12). IIUC this commit just gives the vmas which implement huge_fault() a chance to handle the fault. Currently just DAX vmas implement huge_fault() in vanilla kernel AFAICT. >