On Mon, Dec 4, 2023 at 11:21 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > In preparation for adding support for anonymous multi-size THP, > introduce new sysfs structure that will be used to control the new > behaviours. A new directory is added under transparent_hugepage for each > supported THP size, and contains an `enabled` file, which can be set to > "inherit" (to inherit the global setting), "always", "madvise" or > "never". For now, the kernel still only supports PMD-sized anonymous > THP, so only 1 directory is populated. > > The first half of the change converts transhuge_vma_suitable() and > hugepage_vma_check() so that they take a bitfield of orders for which > the user wants to determine support, and the functions filter out all > the orders that can't be supported, given the current sysfs > configuration and the VMA dimensions. If there is only 1 order set in > the input then the output can continue to be treated like a boolean; > this is the case for most call sites. The resulting functions are > renamed to thp_vma_suitable_orders() and thp_vma_allowable_orders() > respectively. > > The second half of the change implements the new sysfs interface. It has > been done so that each supported THP size has a `struct thpsize`, which > describes the relevant metadata and is itself a kobject. This is pretty > minimal for now, but should make it easy to add new per-thpsize files to > the interface if needed in future (e.g. per-size defrag). Rather than > keep the `enabled` state directly in the struct thpsize, I've elected to > directly encode it into huge_anon_orders_[always|madvise|inherit] > bitfields since this reduces the amount of work required in > thp_vma_allowable_orders() which is called for every page fault. > > See Documentation/admin-guide/mm/transhuge.rst, as modified by this > commit, for details of how the new sysfs interface works. > > Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> Reviewed-by: Barry Song <v-songbaohua@xxxxxxxx> > -khugepaged will be automatically started when > -transparent_hugepage/enabled is set to "always" or "madvise, and it'll > -be automatically shutdown if it's set to "never". > +khugepaged will be automatically started when one or more hugepage > +sizes are enabled (either by directly setting "always" or "madvise", > +or by setting "inherit" while the top-level enabled is set to "always" > +or "madvise"), and it'll be automatically shutdown when the last > +hugepage size is disabled (either by directly setting "never", or by > +setting "inherit" while the top-level enabled is set to "never"). > > Khugepaged controls > ------------------- > > +.. note:: > + khugepaged currently only searches for opportunities to collapse to > + PMD-sized THP and no attempt is made to collapse to other THP > + sizes. For small-size THP, collapse is probably a bad idea. we like a one-shot try in Android especially we are using a 64KB and less large folio size. if PF succeeds in getting large folios, we map large folios, otherwise we give up as those memories can be quite unstably swapped-out, swapped-in and madvised to be DONTNEED. too many compactions will increase power consumption and decrease UI response. Thanks Barry