On Fri, Oct 6, 2023 at 2:08 PM David Hildenbrand <david@xxxxxxxxxx> wrote: > > On 29.09.23 13:44, Ryan Roberts wrote: > > In addition to passing a bitfield of folio orders to enable for THP, > > allow the string "recommend" to be written, which has the effect of > > causing the system to enable the orders preferred by the architecture > > and by the mm. The user can see what these orders are by subsequently > > reading back the file. > > > > Note that these recommended orders are expected to be static for a given > > boot of the system, and so the keyword "auto" was deliberately not used, > > as I want to reserve it for a possible future use where the "best" order > > is chosen more dynamically at runtime. > > > > Recommended orders are determined as follows: > > - PMD_ORDER: The traditional THP size > > - arch_wants_pte_order() if implemented by the arch > > - PAGE_ALLOC_COSTLY_ORDER: The largest order kept on per-cpu free list > > > > arch_wants_pte_order() can be overridden by the architecture if desired. > > Some architectures (e.g. arm64) can coalsece TLB entries if a contiguous > > set of ptes map physically contigious, naturally aligned memory, so this > > mechanism allows the architecture to optimize as required. > > > > Here we add the default implementation of arch_wants_pte_order(), used > > when the architecture does not define it, which returns -1, implying > > that the HW has no preference. > > > > Signed-off-by: Ryan Roberts <ryan.roberts@xxxxxxx> > > --- > > Documentation/admin-guide/mm/transhuge.rst | 4 ++++ > > include/linux/pgtable.h | 13 +++++++++++++ > > mm/huge_memory.c | 14 +++++++++++--- > > 3 files changed, 28 insertions(+), 3 deletions(-) > > > > diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst > > index 732c3b2f4ba8..d6363d4efa3a 100644 > > --- a/Documentation/admin-guide/mm/transhuge.rst > > +++ b/Documentation/admin-guide/mm/transhuge.rst > > @@ -187,6 +187,10 @@ pages (=16K if the page size is 4K). The example above enables order-9 > > By enabling multiple orders, allocation of each order will be > > attempted, highest to lowest, until a successful allocation is made. > > If the PMD-order is unset, then no PMD-sized THPs will be allocated. > > +It is also possible to enable the recommended set of orders, which > > +will be optimized for the architecture and mm:: > > + > > + echo recommend >/sys/kernel/mm/transparent_hugepage/anon_orders > > > > The kernel will ignore any orders that it does not support so read the > > file back to determine which orders are enabled:: > > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > > index af7639c3b0a3..0e110ce57cc3 100644 > > --- a/include/linux/pgtable.h > > +++ b/include/linux/pgtable.h > > @@ -393,6 +393,19 @@ static inline void arch_check_zapped_pmd(struct vm_area_struct *vma, > > } > > #endif > > > > +#ifndef arch_wants_pte_order > > +/* > > + * Returns preferred folio order for pte-mapped memory. Must be in range [0, > > + * PMD_ORDER) and must not be order-1 since THP requires large folios to be at > > + * least order-2. Negative value implies that the HW has no preference and mm > > + * will choose it's own default order. > > + */ > > +static inline int arch_wants_pte_order(void) > > +{ > > + return -1; > > +} > > +#endif > > + > > #ifndef __HAVE_ARCH_PTEP_GET_AND_CLEAR > > static inline pte_t ptep_get_and_clear(struct mm_struct *mm, > > unsigned long address, > > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > > index bcecce769017..e2e2d3906a21 100644 > > --- a/mm/huge_memory.c > > +++ b/mm/huge_memory.c > > @@ -464,10 +464,18 @@ static ssize_t anon_orders_store(struct kobject *kobj, > > int err; > > int ret = count; > > unsigned int orders; > > + int arch; > > > > - err = kstrtouint(buf, 0, &orders); > > - if (err) > > - ret = -EINVAL; > > + if (sysfs_streq(buf, "recommend")) { > > + arch = max(arch_wants_pte_order(), PAGE_ALLOC_COSTLY_ORDER); > > + orders = BIT(arch); > > + orders |= BIT(PAGE_ALLOC_COSTLY_ORDER); > > + orders |= BIT(PMD_ORDER); > > + } else { > > + err = kstrtouint(buf, 0, &orders); > > + if (err) > > + ret = -EINVAL; > > + } > > > > if (ret > 0) { > > orders &= THP_ORDERS_ALL_ANON; > > :/ don't really like that. Regarding my proposal, one could have > something like that in an "auto" setting for the "enabled" value, or a > "recommended" setting [not sure]. Me either. Again this is something I call random -- we only discussed "auto", and yes, the commit message above explained why "recommended" here but it has never surfaced in previous discussions, has it? If so, this reinforces what I said here [1]. [1] https://lore.kernel.org/mm-commits/CAOUHufYEKx5_zxRJkeqrmnStFjR+pVQdpZ40ATSTaxLA_iRPGw@xxxxxxxxxxxxxx/