On Wed, Aug 14, 2024 at 8:18 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 14.08.24 04:02, Barry Song wrote: > > From: Ryan Roberts <ryan.roberts@xxxxxxx> > > > > Add thp_anon= cmdline parameter to allow specifying the default > > enablement of each supported anon THP size. The parameter accepts the > > following format and can be provided multiple times to configure each > > size: > > > > thp_anon=<size>,<size>[KMG]:<value>;<size>-<size>[KMG]:<value> > > > > An example: > > > > thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never > > > > See Documentation/admin-guide/mm/transhuge.rst for more details. > > > > Configuring the defaults at boot time is useful to allow early user > > space to take advantage of mTHP before its been configured through > > sysfs. > > > > Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> > > Co-developed-by: Barry Song <v-songbaohua@xxxxxxxx> > > Signed-off-by: Barry Song <v-songbaohua@xxxxxxxx> > > --- > > -v4: > > * use bitmap APIs to set and clear bits. thanks very much for > > David's comment! > > > > .../admin-guide/kernel-parameters.txt | 9 ++ > > Documentation/admin-guide/mm/transhuge.rst | 37 +++++-- > > mm/huge_memory.c | 96 ++++++++++++++++++- > > 3 files changed, 134 insertions(+), 8 deletions(-) > > > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > > index f0057bac20fb..d0d141d50638 100644 > > --- a/Documentation/admin-guide/kernel-parameters.txt > > +++ b/Documentation/admin-guide/kernel-parameters.txt > > @@ -6629,6 +6629,15 @@ > > <deci-seconds>: poll all this frequency > > 0: no polling (default) > > > > + thp_anon= [KNL] > > + Format: <size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state> > > + state is one of "always", "madvise", "never" or "inherit". > > + Can be used to control the default behavior of the > > + system with respect to anonymous transparent hugepages. > > + Can be used multiple times for multiple anon THP sizes. > > + See Documentation/admin-guide/mm/transhuge.rst for more > > + details. > > + > > threadirqs [KNL,EARLY] > > Force threading of all interrupt handlers except those > > marked explicitly IRQF_NO_THREAD. > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > > index 7072469de8a8..528e1a19d63f 100644 > > --- a/Documentation/admin-guide/mm/transhuge.rst > > +++ b/Documentation/admin-guide/mm/transhuge.rst > > @@ -284,13 +284,36 @@ that THP is shared. Exceeding the number would block the collapse:: > > > > A higher value may increase memory footprint for some workloads. > > > > -Boot parameter > > -============== > > - > > -You can change the sysfs boot time defaults of Transparent Hugepage > > -Support by passing the parameter ``transparent_hugepage=always`` or > > -``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` > > -to the kernel command line. > > +Boot parameters > > +=============== > > + > > +You can change the sysfs boot time default for the top-level "enabled" > > +control by passing the parameter ``transparent_hugepage=always`` or > > +``transparent_hugepage=madvise`` or ``transparent_hugepage=never`` to the > > +kernel command line. > > + > > +Alternatively, each supported anonymous THP size can be controlled by > > +passing ``thp_anon=<size>,<size>[KMG]:<state>;<size>-<size>[KMG]:<state>``, > > +where ``<size>`` is the THP size and ``<state>`` is one of ``always``, > > +``madvise``, ``never`` or ``inherit``. > > + > > +For example, the following will set 16K, 32K, 64K THP to ``always``, > > +set 128K, 512K to ``inherit``, set 256K to ``madvise`` and 1M, 2M > > +to ``never``:: > > + > > + thp_anon=16K-64K:always;128K,512K:inherit;256K:madvise;1M-2M:never > > + > > +``thp_anon=`` may be specified multiple times to configure all THP sizes as > > +required. If ``thp_anon=`` is specified at least once, any anon THP sizes > > +not explicitly configured on the command line are implicitly set to > > +``never``. > > + > > +``transparent_hugepage`` setting only affects the global toggle. If > > +``thp_anon`` is not specified, PMD_ORDER THP will default to ``inherit``. > > +However, if a valid ``thp_anon`` setting is provided by the user, the > > +PMD_ORDER THP policy will be overridden. If the policy for PMD_ORDER > > +is not defined within a valid ``thp_anon``, its policy will default to > > +``never``. > > > > Hugepages in tmpfs/shmem > > ======================== > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index 1a12c011e2df..c5f4e97b49de 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -81,6 +81,7 @@ unsigned long huge_zero_pfn __read_mostly = ~0UL; > > unsigned long huge_anon_orders_always __read_mostly; > > unsigned long huge_anon_orders_madvise __read_mostly; > > unsigned long huge_anon_orders_inherit __read_mostly; > > +static bool anon_orders_configured; > > > > unsigned long __thp_vma_allowable_orders(struct vm_area_struct *vma, > > unsigned long vm_flags, > > @@ -737,7 +738,10 @@ static int __init hugepage_init_sysfs(struct kobject **hugepage_kobj) > > * disable all other sizes. powerpc's PMD_ORDER isn't a compile-time > > * constant so we have to do this here. > > */ > > - huge_anon_orders_inherit = BIT(PMD_ORDER); > > + if (!anon_orders_configured) { > > + huge_anon_orders_inherit = BIT(PMD_ORDER); > > + anon_orders_configured = true; > > + } > > > > *hugepage_kobj = kobject_create_and_add("transparent_hugepage", mm_kobj); > > if (unlikely(!*hugepage_kobj)) { > > @@ -922,6 +926,96 @@ static int __init setup_transparent_hugepage(char *str) > > } > > __setup("transparent_hugepage=", setup_transparent_hugepage); > > > > +static inline int get_order_from_str(const char *size_str) > > +{ > > + unsigned long size; > > + char *endptr; > > + int order; > > + > > + size = memparse(size_str, &endptr); > > Do we have to also test if is_power_of_2(), and refuse if not? For > example, what if someone would pass 3K, would the existing check catch it? no, the existing check can't catch it. I passed thp_anon=15K-64K:always, then I got 16K enabled: / # cat /sys/kernel/mm/transparent_hugepage/hugepages-16kB/enabled [always] inherit madvise never I can actually check that by: static inline int get_order_from_str(const char *size_str) { unsigned long size; char *endptr; int order; size = memparse(size_str, &endptr); if (!is_power_of_2(size >> PAGE_SHIFT)) goto err; order = get_order(size); if ((1 << order) & ~THP_ORDERS_ALL_ANON) goto err; return order; err: pr_err("invalid size %s in thp_anon boot parameter\n", size_str); return -EINVAL; } > > > + order = fls(size >> PAGE_SHIFT) - 1; > > Is this a fancy way of writing > > order = log2(size >> PAGE_SHIFT); > > ? :) I think ilog2 is implemented by fls ? > > Anyhow, if get_order() wraps that, all good. I guess it doesn't check power of 2? > > > + if ((1 << order) & ~THP_ORDERS_ALL_ANON) { > > + pr_err("invalid size %s(order %d) in thp_anon boot parameter\n", > > + size_str, order); > > + return -EINVAL; > > + } > > + > > + return order; > > +} > > Apart from that, nothing jumped at me. Please take a look at the new get_order_from_str() before I send v5 :-) > > -- > Cheers, > > David / dhildenb > Thanks Barry