On 31/10/2023 11:58, David Hildenbrand wrote: > On 31.10.23 12:50, Ryan Roberts wrote: >> On 06/10/2023 21:06, David Hildenbrand wrote: >> [...] >>> >>> Change 2: sysfs interface. >>> >>> If we call it THP, it shall go under "/sys/kernel/mm/transparent_hugepage/", I >>> agree. >>> >>> What we expose there and how, is TBD. Again, not a friend of "orders" and >>> bitmaps at all. We can do better if we want to go down that path. >>> >>> Maybe we should take a look at hugetlb, and how they added support for multiple >>> sizes. What *might* make sense could be (depending on which values we actually >>> support!) >>> >>> >>> /sys/kernel/mm/transparent_hugepage/hugepages-64kB/ >>> /sys/kernel/mm/transparent_hugepage/hugepages-128kB/ >>> /sys/kernel/mm/transparent_hugepage/hugepages-256kB/ >>> /sys/kernel/mm/transparent_hugepage/hugepages-512kB/ >>> /sys/kernel/mm/transparent_hugepage/hugepages-1024kB/ >>> /sys/kernel/mm/transparent_hugepage/hugepages-2048kB/ >>> >>> Each one would contain an "enabled" and "defrag" file. We want something minimal >>> first? Start with the "enabled" option. >>> >>> >>> enabled: always [global] madvise never >>> >>> Initially, we would set it for PMD-sized THP to "global" and for everything else >>> to "never". >> >> Hi David, > > Hi! > >> >> I've just started coding this, and it occurs to me that I might need a small >> clarification here; the existing global "enabled" control is used to drive >> decisions for both anonymous memory and (non-shmem) file-backed memory. But the >> proposed new per-size "enabled" is implicitly only controlling anon memory (for >> now). > > Anon was (way) first, and pagecache later decided to reuse that one as an > indication whether larger folios are desired. > > For the pagecache, it's just a way to enable/disable it globally. As there is no > memory waste, nobody currently really cares about the exact sized the pagecache > is allocating (maybe that will change at some point, maybe not, who knows). Yup. Its not _just_ about allocation though; its also about collapse (MADV_COLLAPSE, khugepaged) which is supported for pagecache pages. I can imagine value in collapsing to various sizes that are beneficial for HW... anyway that's for another day. > >> >> 1) Is this potentially confusing for the user? Should we rename the per-size >> controls to "anon_enabled"? Or is it preferable to jsut keep it vague for now so >> we can reuse the same control for file-backed memory in future? > > The latter would be my take. Just like we did with the global toggle. ACK > >> >> 2) The global control will continue to drive the file-backed memory decision >> (for now), even when hugepages-2048kB/enabled != "global"; agreed? > > That would be my take; it will allocate other sizes already, so just glue it to > the global toggle and document for the other toggles that they only control > anonymous THP for now. ACK >