> On Jul 3, 2019, at 7:04 AM, Juergen Gross <jgross@xxxxxxxx> wrote: > > On 03.07.19 01:51, Nadav Amit wrote: >> To improve TLB shootdown performance, flush the remote and local TLBs >> concurrently. Introduce flush_tlb_multi() that does so. Introduce >> paravirtual versions of flush_tlb_multi() for KVM, Xen and hyper-v (Xen >> and hyper-v are only compile-tested). >> While the updated smp infrastructure is capable of running a function on >> a single local core, it is not optimized for this case. The multiple >> function calls and the indirect branch introduce some overhead, and >> might make local TLB flushes slower than they were before the recent >> changes. >> Before calling the SMP infrastructure, check if only a local TLB flush >> is needed to restore the lost performance in this common case. This >> requires to check mm_cpumask() one more time, but unless this mask is >> updated very frequently, this should impact performance negatively. >> Cc: "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx> >> Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx> >> Cc: Stephen Hemminger <sthemmin@xxxxxxxxxxxxx> >> Cc: Sasha Levin <sashal@xxxxxxxxxx> >> Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx> >> Cc: Ingo Molnar <mingo@xxxxxxxxxx> >> Cc: Borislav Petkov <bp@xxxxxxxxx> >> Cc: x86@xxxxxxxxxx >> Cc: Juergen Gross <jgross@xxxxxxxx> >> Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx> >> Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx> >> Cc: Andy Lutomirski <luto@xxxxxxxxxx> >> Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx> >> Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx> >> Cc: linux-hyperv@xxxxxxxxxxxxxxx >> Cc: linux-kernel@xxxxxxxxxxxxxxx >> Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx >> Cc: kvm@xxxxxxxxxxxxxxx >> Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx >> Signed-off-by: Nadav Amit <namit@xxxxxxxxxx> >> --- >> arch/x86/hyperv/mmu.c | 13 +++--- >> arch/x86/include/asm/paravirt.h | 6 +-- >> arch/x86/include/asm/paravirt_types.h | 4 +- >> arch/x86/include/asm/tlbflush.h | 9 ++-- >> arch/x86/include/asm/trace/hyperv.h | 2 +- >> arch/x86/kernel/kvm.c | 11 +++-- >> arch/x86/kernel/paravirt.c | 2 +- >> arch/x86/mm/tlb.c | 65 ++++++++++++++++++++------- >> arch/x86/xen/mmu_pv.c | 20 ++++++--- >> include/trace/events/xen.h | 2 +- >> 10 files changed, 91 insertions(+), 43 deletions(-) > > ... > >> diff --git a/arch/x86/xen/mmu_pv.c b/arch/x86/xen/mmu_pv.c >> index beb44e22afdf..19e481e6e904 100644 >> --- a/arch/x86/xen/mmu_pv.c >> +++ b/arch/x86/xen/mmu_pv.c >> @@ -1355,8 +1355,8 @@ static void xen_flush_tlb_one_user(unsigned long addr) >> preempt_enable(); >> } >> -static void xen_flush_tlb_others(const struct cpumask *cpus, >> - const struct flush_tlb_info *info) >> +static void xen_flush_tlb_multi(const struct cpumask *cpus, >> + const struct flush_tlb_info *info) >> { >> struct { >> struct mmuext_op op; >> @@ -1366,7 +1366,7 @@ static void xen_flush_tlb_others(const struct cpumask *cpus, >> const size_t mc_entry_size = sizeof(args->op) + >> sizeof(args->mask[0]) * BITS_TO_LONGS(num_possible_cpus()); >> - trace_xen_mmu_flush_tlb_others(cpus, info->mm, info->start, info->end); >> + trace_xen_mmu_flush_tlb_multi(cpus, info->mm, info->start, info->end); >> if (cpumask_empty(cpus)) >> return; /* nothing to do */ >> @@ -1375,9 +1375,17 @@ static void xen_flush_tlb_others(const struct cpumask *cpus, >> args = mcs.args; >> args->op.arg2.vcpumask = to_cpumask(args->mask); >> - /* Remove us, and any offline CPUS. */ >> + /* Flush locally if needed and remove us */ >> + if (cpumask_test_cpu(smp_processor_id(), to_cpumask(args->mask))) { >> + local_irq_disable(); >> + flush_tlb_func_local(info); > > I think this isn't the correct function for PV guests. > > In fact it should be much easier: just don't clear the own cpu from the > mask, that's all what's needed. The hypervisor is just fine having the > current cpu in the mask and it will do the right thing. Thanks. I will do so in v3. I don’t think Hyper-V people would want to do the same, unfortunately, since it would induce VM-exit on TLB flushes. But if they do - I’ll be able not to expose flush_tlb_func_local().