On 10/28/22 13:42, Yicong Yang wrote: > +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) > +{ > + /* > + * TLB batched flush is proved to be beneficial for systems with large > + * number of CPUs, especially system with more than 8 CPUs. TLB shutdown > + * is cheap on small systems which may not need this feature. So use > + * a threshold for enabling this to avoid potential side effects on > + * these platforms. > + */ > + if (num_online_cpus() <= CONFIG_ARM64_NR_CPUS_FOR_BATCHED_TLB) > + return false; > + > +#ifdef CONFIG_ARM64_WORKAROUND_REPEAT_TLBI > + if (unlikely(this_cpu_has_cap(ARM64_WORKAROUND_REPEAT_TLBI))) > + return false; > +#endif should_defer_flush() is immediately followed by set_tlb_ubc_flush_pending() which calls arch_tlbbatch_add_mm(), triggering the actual TLBI flush via __flush_tlb_page_nosync(). It should be okay to check capability with this_cpu_has_cap() as the entire call chain here is executed on the same cpu. But just wondering if cpus_have_const_cap() would be simpler, consistent, and also cost effective ? Regardless, a comment is needed before the #ifdef block explaining why it does not make sense to defer/batch when __tlbi()/__tlbi_user() implementation will execute 'dsb(ish)' between two TLBI instructions to workaround the errata. > + > + return true; > +} > + > +static inline void arch_tlbbatch_add_mm(struct arch_tlbflush_unmap_batch *batch, > + struct mm_struct *mm, > + unsigned long uaddr) > +{ > + __flush_tlb_page_nosync(mm, uaddr); > +} > + > +static inline void arch_tlbbatch_flush(struct arch_tlbflush_unmap_batch *batch) > +{ > + dsb(ish); > +}