> On 8. Feb 2018, at 13:17, David Woodhouse <dwmw2@xxxxxxxxxxxxx> wrote: > > > From: David Woodhouse <dwmw@xxxxxxxxxxxx> > Subject: [RFC PATCH 2/4] KVM: x86: Reduce retpoline performance impact in slot_handle_level_range() > Date: 7. February 2018 at 01:03:12 GMT+1 > To: tglx@xxxxxxxxxxxxx, torvalds@xxxxxxxxxxxxxxxxxxxx, x86@xxxxxxxxxx, linux-kernel@xxxxxxxxxxxxxxx, bp@xxxxxxxxx, peterz@xxxxxxxxxxxxx, tim.c.chen@xxxxxxxxxxxxxxx, dave.hansen@xxxxxxxxx, arjan.van.de.ven@xxxxxxxxx > > > With retpoline, tight loops of "call this function for every XXX" are > very much pessimised by taking a prediction miss *every* time. This one > showed up very high in our early testing. > > By marking the iterator slot_handle_…() functions always_inline, we can > ensure that the indirect function call can be optimised away into a > direct call and it actually generates slightly smaller code because > some of the other conditionals can get optimised away too. > > Suggested-by: Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> > Signed-off-by: David Woodhouse <dwmw@xxxxxxxxxxxx> > --- > arch/x86/kvm/mmu.c | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c > index 2b8eb4d..cc83bdc 100644 > --- a/arch/x86/kvm/mmu.c > +++ b/arch/x86/kvm/mmu.c > @@ -5058,7 +5058,7 @@ void kvm_mmu_uninit_vm(struct kvm *kvm) > typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head *rmap_head); > > /* The caller should hold mmu-lock before calling this function. */ > -static bool > +static __always_inline bool > slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot, > slot_level_handler fn, int start_level, int end_level, > gfn_t start_gfn, gfn_t end_gfn, bool lock_flush_tlb) > @@ -5088,7 +5088,7 @@ slot_handle_level_range(struct kvm *kvm, struct kvm_memory_slot *memslot, > return flush; > } > > -static bool > +static __always_inline bool > slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot, > slot_level_handler fn, int start_level, int end_level, > bool lock_flush_tlb) > @@ -5099,7 +5099,7 @@ slot_handle_level(struct kvm *kvm, struct kvm_memory_slot *memslot, > lock_flush_tlb); > } > > -static bool > +static __always_inline bool > slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot, > slot_level_handler fn, bool lock_flush_tlb) > { > @@ -5107,7 +5107,7 @@ slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot, > PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb); > } > > -static bool > +static __always_inline bool > slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot, > slot_level_handler fn, bool lock_flush_tlb) > { > @@ -5115,7 +5115,7 @@ slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot, > PT_MAX_HUGEPAGE_LEVEL, lock_flush_tlb); > } > > -static bool > +static __always_inline bool > slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot, > slot_level_handler fn, bool lock_flush_tlb) > { > -- > 2.7.4 +kvm@xxxxxxxxxxxxxxx With this patch, launches of "large instances" are pretty close to what we see with nospectre_v2 (within tens of milliseconds). Reviewed-by: Filippo Sironi <sironi@xxxxxxxxx> Tested-by: Filippo Sironi <sironi@xxxxxxxxx> Amazon Development Center Germany GmbH Berlin - Dresden - Aachen main office: Krausenstr. 38, 10117 Berlin Geschaeftsfuehrer: Dr. Ralf Herbrich, Christian Schlaeger Ust-ID: DE289237879 Eingetragen am Amtsgericht Charlottenburg HRB 149173 B