On Wed, 13 Nov 2019 17:55:30 +0800 Shile Zhang <shile.zhang@xxxxxxxxxxxxxxxxx> wrote: > vmalloc_sync_all() was put in the common path in > __purge_vmap_area_lazy(), for one sync issue only happened on X86_32 > with PTI enabled. It is needless for X86_64, which caused a big regression > in UnixBench Shell8 testing on X86_64. > Similar regression also reported by 0-day kernel test robot in reaim > benchmarking: > https://lists.01.org/hyperkitty/list/lkp@xxxxxxxxxxxx/thread/4D3JPPHBNOSPFK2KEPC6KGKS6J25AIDB/ That is indeed a large performance regression. > Fix it by adding more conditions. > > Fixes: 3f8fd02b1bf1 ("mm/vmalloc: Sync unmappings in __purge_vmap_area_lazy()") > > ... > > --- a/mm/vmalloc.c > +++ b/mm/vmalloc.c > @@ -1255,11 +1255,17 @@ static bool __purge_vmap_area_lazy(unsigned long start, unsigned long end) > if (unlikely(valist == NULL)) > return false; > > +#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) Are we sure that x86_32 is the only architecture whcih is (or ever will be) affected? > /* > * First make sure the mappings are removed from all page-tables > * before they are freed. > + * > + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is > + * the case on a PAE kernel with PTI enabled. > */ > - vmalloc_sync_all(); > + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) > + vmalloc_sync_all(); > +#endif > > /* > * TODO: to calculate a flush range without looping. CONFIG_X86_PAE depends on CONFIG_X86_32 so no need to check CONFIG_X86_32? From: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> Subject: mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix simplify config expression, use IS_ENABLED() Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> Cc: Joerg Roedel <jroedel@xxxxxxx> Cc: Qian Cai <cai@xxxxxx> Cc: Shile Zhang <shile.zhang@xxxxxxxxxxxxxxxxx> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> Signed-off-by: Andrew Morton <akpm@xxxxxxxxxxxxxxxxxxxx> --- mm/vmalloc.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) --- a/mm/vmalloc.c~mm-vmalloc-fix-regression-caused-by-needless-vmalloc_sync_all-fix +++ a/mm/vmalloc.c @@ -1255,16 +1255,17 @@ static bool __purge_vmap_area_lazy(unsig if (unlikely(valist == NULL)) return false; -#if defined(CONFIG_X86_32) && defined(CONFIG_X86_PAE) - /* - * First make sure the mappings are removed from all page-tables - * before they are freed. - * - * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which is - * the case on a PAE kernel with PTI enabled. - */ - if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) - vmalloc_sync_all(); + if (IS_ENABLED(CONFIG_X86_PAE)) { + /* + * First make sure the mappings are removed from all page-tables + * before they are freed. + * + * This is only needed on x86-32 with !SHARED_KERNEL_PMD, which + * is the case on a PAE kernel with PTI enabled. + */ + if (!SHARED_KERNEL_PMD && boot_cpu_has(X86_FEATURE_PTI)) + vmalloc_sync_all(); + } #endif /* _