On 9/21/22 07:21, Barry Song wrote: > On Wed, Sep 21, 2022 at 1:50 PM Barry Song <21cnbao@xxxxxxxxx> wrote: >> >> On Tue, Sep 20, 2022 at 8:45 PM Anshuman Khandual >> <anshuman.khandual@xxxxxxx> wrote: >>> >>> >>> >>> On 9/20/22 09:09, Barry Song wrote: >>>> On Tue, Sep 20, 2022 at 3:00 PM Anshuman Khandual >>>> <anshuman.khandual@xxxxxxx> wrote: >>>>> >>>>> >>>>> On 8/22/22 13:51, Yicong Yang wrote: >>>>>> +static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) >>>>>> +{ >>>>>> + return true; >>>>>> +} >>>>> >>>>> This needs to be conditional on systems, where there will be performance >>>>> improvements, and should not just be enabled all the time on all systems. >>>>> num_online_cpus() > X, which does not hold any cpu hotplug lock would be >>>>> a good metric ? >>>> >>>> for a small system, i don't see how this patch will help, e.g. cpus <= 4; >>>> so we can actually disable tlb-batch on small systems. >>> >>> Do not subscribe ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH based on NR_CPUS ? >>> That might not help much as the default value is 256 for NR_CPUS. >>> >>> OR >>> >>> arch_tlbbatch_should_defer() checks on >>> >>> 1. online cpus (dont enable batched TLB if <= X) >>> 2. ARM64_WORKAROUND_REPEAT_TLBI (dont enable batched TLB) >>> >>>> just need to check if we will have any race condition since hotplug will >>>> make the condition true and false dynamically. >>> >>> If should_defer_flush() evaluate to be false, then ptep_clear_flush() >>> clears and flushes the entry right away. This should not race with other >>> queued up TLBI requests, which will be flushed separately. Wondering how >>> there can be a race here ! >> >> Right. How about we make something as below? >> >> static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) >> { >> /* for a small system very small number of CPUs, TLB shootdown is cheap */ >> if (num_online_cpus() <= 4 || >> unlikely(this_cpu_has_cap(ARM64_WORKAROUND_REPEAT_TLBI))) >> return false; >> >> #ifdef CONFIG_ARM64_WORKAROUND_REPEAT_TLBI >> if (unlikely(this_cpu_has_cap(ARM64_WORKAROUND_REPEAT_TLBI))) >> return false; >> #endif >> >> return true; >> } > > sorry, i mean > > static inline bool arch_tlbbatch_should_defer(struct mm_struct *mm) > { > /* for a small system very small number of CPUs, TLB shootdown is cheap */ > if (num_online_cpus() <= 4) > return false; > > #ifdef CONFIG_ARM64_WORKAROUND_REPEAT_TLBI > if (unlikely(this_cpu_has_cap(ARM64_WORKAROUND_REPEAT_TLBI))) > return false; > #endif > > return true; > } This is a good starting point.