On Wed, Jul 17, 2024 at 7:13 PM Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > > Add thp_anon= cmdline parameter to allow specifying the default > enablement of each supported anon THP size. The parameter accepts the > following format and can be provided multiple times to configure each > size: > > thp_anon=<size>[KMG]:<value> > > See Documentation/admin-guide/mm/transhuge.rst for more details. > > Configuring the defaults at boot time is useful to allow early user > space to take advantage of mTHP before its been configured through > sysfs. This is exactly what I need and want to implement, as the current behavior is problematic. We need to boot up the system and reach the point where we can set up the sys interfaces to enable mTHP. Many processes miss the opportunity to use mTHP. On the other hand, userspace might have been tuned to detect that mTHP is enabled, such as a .so library. However, it turns out we have had inconsistent settings between the two stages - before and after setting mTHP enabled by sys interfaces. > > Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> > --- > .../admin-guide/kernel-parameters.txt | 8 +++ > Documentation/admin-guide/mm/transhuge.rst | 26 +++++++-- > mm/huge_memory.c | 55 ++++++++++++++++++- > 3 files changed, 82 insertions(+), 7 deletions(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index bc55fb55cd26..48443ad12e3f 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -6592,6 +6592,14 @@ > <deci-seconds>: poll all this frequency > 0: no polling (default) > > + thp_anon= [KNL] > + Format: <size>[KMG]:always|madvise|never|inherit > + Can be used to control the default behavior of the > + system with respect to anonymous transparent hugepages. > + Can be used multiple times for multiple anon THP sizes. > + See Documentation/admin-guide/mm/transhuge.rst for more > + details. > + > threadirqs [KNL,EARLY] > Force threading of all interrupt handlers except those > marked explicitly IRQF_NO_THREAD. > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > index 1aaf8e3a0b5a..f53d43d986e2 100644 > --- a/Documentation/admin-guide/mm/transhuge.rst > +++ b/Documentation/admin-guide/mm/transhuge.rst > @@ -311,13 +311,27 @@ performance. > Note that any changes to the allowed set of sizes only applies to future > file-backed THP allocations. > > -Boot parameter > -============== > +Boot parameters > +=============== > > -You can change the sysfs boot time defaults of Transparent Hugepage > -Support by passing the parameter ``transparent_hugepage=always`` or > -``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` > -to the kernel command line. > +You can change the sysfs boot time default for the top-level "enabled" > +control by passing the parameter ``transparent_hugepage=always`` or > +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the > +kernel command line. > + > +Alternatively, each supported anonymous THP size can be controlled by > +passing ``thp_anon=<size>[KMG]:<state>``, where ``<size>`` is the THP size > +and ``<state>`` is one of ``always``, ``madvise``, ``never`` or > +``inherit``. > + > +For example, the following will set 64K THP to ``always``:: > + > + thp_anon=64K:always > + > +``thp_anon=`` may be specified multiple times to configure all THP sizes as > +required. If ``thp_anon=`` is specified at least once, any anon THP sizes > +not explicitly configured on the command line are implicitly set to > +``never``. > > Hugepages in tmpfs/shmem > ======================== > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index 4249c0bc9388..794d2790d90d 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -82,6 +82,7 @@ unsigned long huge_anon_orders_madvise __read_mostly; > unsigned long huge_anon_orders_inherit __read_mostly; > unsigned long huge_file_orders_always __read_mostly; > int huge_file_exec_order __read_mostly = -1; > +static bool anon_orders_configured; > > unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, > unsigned long vm_flags, > @@ -763,7 +764,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj) > * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time > * constant so we have to do this here. > */ > - huge_anon_orders_inherit = BIT(PMD_ORDER); > + if (!anon_orders_configured) { > + huge_anon_orders_inherit = BIT(PMD_ORDER); > + anon_orders_configured = true; > + } > > /* > * For pagecache, default to enabling all orders. powerpc's PMD_ORDER > @@ -955,6 +959,55 @@ static int __init setup_transparent_hugepage(char *str) > } > __setup("transparent_hugepage=", setup_transparent_hugepage); > > +static int __init setup_thp_anon(char *str) > +{ > + unsigned long size; > + char *state; > + int order; > + int ret = 0; > + > + if (!str) > + goto out; > + > + size = (unsigned long)memparse(str, &state); > + order = ilog2(size >> PAGE_SHIFT); > + if (*state != ':' || !is_power_of_2(size) || size <= PAGE_SIZE || > + !(BIT(order) & THP_ORDERS_ALL_ANON)) > + goto out; > + > + state++; > + > + if (!strcmp(state, "always")) { > + clear_bit(order, &huge_anon_orders_inherit); > + clear_bit(order, &huge_anon_orders_madvise); > + set_bit(order, &huge_anon_orders_always); > + ret = 1; > + } else if (!strcmp(state, "inherit")) { > + clear_bit(order, &huge_anon_orders_always); > + clear_bit(order, &huge_anon_orders_madvise); > + set_bit(order, &huge_anon_orders_inherit); > + ret = 1; > + } else if (!strcmp(state, "madvise")) { > + clear_bit(order, &huge_anon_orders_always); > + clear_bit(order, &huge_anon_orders_inherit); > + set_bit(order, &huge_anon_orders_madvise); > + ret = 1; > + } else if (!strcmp(state, "never")) { > + clear_bit(order, &huge_anon_orders_always); > + clear_bit(order, &huge_anon_orders_inherit); > + clear_bit(order, &huge_anon_orders_madvise); > + ret = 1; > + } > + > + if (ret) > + anon_orders_configured = true; > +out: > + if (!ret) > + pr_warn("thp_anon=%s: cannot parse, ignored\n", str); > + return ret; > +} > +__setup("thp_anon=", setup_thp_anon); > + > pmd_t maybe_pmd_mkwrite(pmd_t pmd, struct vm_area_struct *vma) > { > if (likely(vma->vm_flags & VM_WRITE)) > -- > 2.43.0 > Thanks Barry