On Thu, 15 Feb 2024 15:40:59 +0000 Ryan Roberts <ryan.roberts@xxxxxxx> wrote: > Change the readahead config so that if it is being requested for an > executable mapping, do a synchronous read of an arch-specified size in a > naturally aligned manner. Some nits: > --- a/arch/arm64/include/asm/pgtable.h > +++ b/arch/arm64/include/asm/pgtable.h > @@ -1115,6 +1115,18 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf, > */ > #define arch_wants_old_prefaulted_pte cpu_has_hw_af > > +/* > + * Request exec memory is read into pagecache in at least 64K folios. The > + * trade-off here is performance improvement due to storing translations more > + * effciently in the iTLB vs the potential for read amplification due to reading "efficiently" > + * data from disk that won't be used. The latter is independent of base page > + * size, so we set a page-size independent block size of 64K. This size can be > + * contpte-mapped when 4K base pages are in use (16 pages into 1 iTLB entry), > + * and HPA can coalesce it (4 pages into 1 TLB entry) when 16K base pages are in > + * use. > + */ > +#define arch_wants_exec_folio_order() ilog2(SZ_64K >> PAGE_SHIFT) > + To my eye, "arch_wants_foo" and "arch_want_foo" are booleans. Either this arch wants a particular treatment or it does not want it. I suggest a better name would be "arch_exec_folio_order". > static inline bool pud_sect_supported(void) > { > return PAGE_SIZE == SZ_4K; > diff --git a/include/linux/pgtable.h b/include/linux/pgtable.h > index aab227e12493..6cdd145cbbb9 100644 > --- a/include/linux/pgtable.h > +++ b/include/linux/pgtable.h > @@ -407,6 +407,18 @@ static inline bool arch_has_hw_pte_young(void) > } > #endif > > +#ifndef arch_wants_exec_folio_order > +/* > + * Returns preferred minimum folio order for executable file-backed memory. Must > + * be in range [0, PMD_ORDER]. Negative value implies that the HW has no > + * preference and mm will not special-case executable memory in the pagecache. > + */ I think this comment contains material which would be useful above the other arch_wants_exec_folio_order() implementation - the "must be in range" part. So I suggest all this material be incorporated into a single comment which describes arch_wants_exec_folio_order(). Then this comment can be removed entirely. Assume the reader knows to go seek the other definition for the commentary. > +static inline int arch_wants_exec_folio_order(void) > +{ > + return -1; > +} > +#endif > + > #ifndef arch_check_zapped_pte > static inline void arch_check_zapped_pte(struct vm_area_struct *vma, > pte_t pte) > diff --git a/mm/filemap.c b/mm/filemap.c > index 142864338ca4..7954274de11c 100644 > --- a/mm/filemap.c > +++ b/mm/filemap.c > @@ -3118,6 +3118,25 @@ static struct file *do_sync_mmap_readahead(struct vm_fault *vmf) > } > #endif > > + /* > + * Allow arch to request a preferred minimum folio order for executable > + * memory. This can often be beneficial to performance if (e.g.) arm64 > + * can contpte-map the folio. Executable memory rarely benefits from > + * read-ahead anyway, due to its random access nature. "readahead" > + */ > + if (vm_flags & VM_EXEC) { > + int order = arch_wants_exec_folio_order(); > + > + if (order >= 0) { > + fpin = maybe_unlock_mmap_for_io(vmf, fpin); > + ra->size = 1UL << order; > + ra->async_size = 0; > + ractl._index &= ~((unsigned long)ra->size - 1); > + page_cache_ra_order(&ractl, ra, order); > + return fpin; > + } > + } > + > /* If we don't want any read-ahead, don't bother */ > if (vm_flags & VM_RAND_READ) > return fpin;