[PATCH v6 0/9] x86/tlb: Concurrent TLB flushes

Nadav Amit <nadav.amit@xxxxxxxxx> · Sat, 20 Feb 2021 15:17:03 -0800

From: Nadav Amit <namit@xxxxxxxxxx>

The series improves TLB shootdown by flushing the local TLB concurrently
with remote TLBs, overlapping the IPI delivery time with the local
flush. Performance numbers can be found in the previous version [1].

v5 was rebased on 5.11 (long time after v4), and had some bugs and
embarrassing build errors. Peter Zijlstra and Christoph Hellwig had some
comments as well. These issues were addressed (excluding one 82-chars
line that I left). Based on their feedback, an additional patch was also
added to reuse on_each_cpu_cond_mask() code and avoid unnecessary calls
by inlining.

KernelCI showed RCU stalls on arm64, which I could not figure out from
the kernel splat. If this issue persists, I would appreciate it someone
can assist in debugging or at least provide the output when running the
kernel with CONFIG_CSD_LOCK_WAIT_DEBUG=Y.

[1] https://lore.kernel.org/lkml/20190823224153.15223-1-namit@xxxxxxxxxx/

v5 -> v6:
* Address build warnings due to rebase mistakes
* Reuse code from on_each_cpu_cond_mask() and inline [PeterZ]
* Fix some style issues [Hellwig]

v4 -> v5:
* Rebase on 5.11
* Move concurrent smp logic to smp_call_function_many_cond() 
* Remove SGI-UV patch which is not needed anymore

v3 -> v4:
* Merge flush_tlb_func_local and flush_tlb_func_remote() [Peter]
* Prevent preemption on_each_cpu(). It is not needed, but it prevents
  concerns. [Peter/tglx]
* Adding acked-, review-by tags

v2 -> v3:
* Open-code the remote/local-flush decision code [Andy]
* Fix hyper-v, Xen implementations [Andrew]
* Fix redundant TLB flushes.

v1 -> v2:
* Removing the patches that Thomas took [tglx]
* Adding hyper-v, Xen compile-tested implementations [Dave]
* Removing UV [Andy]
* Adding lazy optimization, removing inline keyword [Dave]
* Restructuring patch-set

RFCv2 -> v1:
* Fix comment on flush_tlb_multi [Juergen]
* Removing async invalidation optimizations [Andy]
* Adding KVM support [Paolo]

Cc: Andy Lutomirski <luto@xxxxxxxxxx>
Cc: Borislav Petkov <bp@xxxxxxxxx>
Cc: Boris Ostrovsky <boris.ostrovsky@xxxxxxxxxx>
Cc: Dave Hansen <dave.hansen@xxxxxxxxxxxxxxx>
Cc: Haiyang Zhang <haiyangz@xxxxxxxxxxxxx>
Cc: Ingo Molnar <mingo@xxxxxxxxxx>
Cc: Josh Poimboeuf <jpoimboe@xxxxxxxxxx>
Cc: Juergen Gross <jgross@xxxxxxxx>
Cc: "K. Y. Srinivasan" <kys@xxxxxxxxxxxxx>
Cc: Paolo Bonzini <pbonzini@xxxxxxxxxx>
Cc: Peter Zijlstra <peterz@xxxxxxxxxxxxx>
Cc: Rik van Riel <riel@xxxxxxxxxxx>
Cc: Sasha Levin <sashal@xxxxxxxxxx>
Cc: Stephen Hemminger <sthemmin@xxxxxxxxxxxxx>
Cc: Thomas Gleixner <tglx@xxxxxxxxxxxxx>
Cc: kvm@xxxxxxxxxxxxxxx
Cc: linux-hyperv@xxxxxxxxxxxxxxx
Cc: linux-kernel@xxxxxxxxxxxxxxx
Cc: virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
Cc: x86@xxxxxxxxxx
Cc: xen-devel@xxxxxxxxxxxxxxxxxxxx

Nadav Amit (9):
  smp: Run functions concurrently in smp_call_function_many_cond()
  x86/mm/tlb: Unify flush_tlb_func_local() and flush_tlb_func_remote()
  x86/mm/tlb: Open-code on_each_cpu_cond_mask() for tlb_is_not_lazy()
  x86/mm/tlb: Flush remote and local TLBs concurrently
  x86/mm/tlb: Privatize cpu_tlbstate
  x86/mm/tlb: Do not make is_lazy dirty for no reason
  cpumask: Mark functions as pure
  x86/mm/tlb: Remove unnecessary uses of the inline keyword
  smp: inline on_each_cpu_cond() and on_each_cpu()

 arch/x86/hyperv/mmu.c                 |  10 +-
 arch/x86/include/asm/paravirt.h       |   6 +-
 arch/x86/include/asm/paravirt_types.h |   4 +-
 arch/x86/include/asm/tlbflush.h       |  48 ++++---
 arch/x86/include/asm/trace/hyperv.h   |   2 +-
 arch/x86/kernel/alternative.c         |   2 +-
 arch/x86/kernel/kvm.c                 |  11 +-
 arch/x86/kernel/paravirt.c            |   2 +-
 arch/x86/mm/init.c                    |   2 +-
 arch/x86/mm/tlb.c                     | 176 +++++++++++++----------
 arch/x86/xen/mmu_pv.c                 |  11 +-
 include/linux/cpumask.h               |   6 +-
 include/linux/smp.h                   |  50 +++++--
 include/trace/events/xen.h            |   2 +-
 kernel/smp.c                          | 196 +++++++++++---------------
 15 files changed, 278 insertions(+), 250 deletions(-)

-- 
2.25.1