On 08/10/13 12:26, Raghavendra KT wrote: > On Mon, Oct 7, 2013 at 9:10 PM, Marc Zyngier <marc.zyngier@xxxxxxx> wrote: >> On an (even slightly) oversubscribed system, spinlocks are quickly >> becoming a bottleneck, as some vcpus are spinning, waiting for a >> lock to be released, while the vcpu holding the lock may not be >> running at all. >> >> This creates contention, and the observed slowdown is 40x for >> hackbench. No, this isn't a typo. >> >> The solution is to trap blocking WFEs and tell KVM that we're >> now spinning. This ensures that other vpus will get a scheduling >> boost, allowing the lock to be released more quickly. >> >> From a performance point of view: hackbench 1 process 1000 >> >> 2xA15 host (baseline): 1.843s >> >> 2xA15 guest w/o patch: 2.083s >> 4xA15 guest w/o patch: 80.212s >> >> 2xA15 guest w/ patch: 2.072s >> 4xA15 guest w/ patch: 3.202s >> >> So we go from a 40x degradation to 1.5x, which is vaguely more >> acceptable. >> >> Signed-off-by: Marc Zyngier <marc.zyngier@xxxxxxx> >> --- >> arch/arm/include/asm/kvm_arm.h | 4 +++- >> arch/arm/kvm/handle_exit.c | 6 +++++- >> 2 files changed, 8 insertions(+), 2 deletions(-) >> >> diff --git a/arch/arm/include/asm/kvm_arm.h b/arch/arm/include/asm/kvm_arm.h >> index 64e9696..693d5b2 100644 >> --- a/arch/arm/include/asm/kvm_arm.h >> +++ b/arch/arm/include/asm/kvm_arm.h >> @@ -67,7 +67,7 @@ >> */ >> #define HCR_GUEST_MASK (HCR_TSC | HCR_TSW | HCR_TWI | HCR_VM | HCR_BSU_IS | \ >> HCR_FB | HCR_TAC | HCR_AMO | HCR_IMO | HCR_FMO | \ >> - HCR_SWIO | HCR_TIDCP) >> + HCR_TWE | HCR_SWIO | HCR_TIDCP) >> #define HCR_VIRT_EXCP_MASK (HCR_VA | HCR_VI | HCR_VF) >> >> /* System Control Register (SCTLR) bits */ >> @@ -208,6 +208,8 @@ >> #define HSR_EC_DABT (0x24) >> #define HSR_EC_DABT_HYP (0x25) >> >> +#define HSR_WFI_IS_WFE (1U << 0) >> + >> #define HSR_HVC_IMM_MASK ((1UL << 16) - 1) >> >> #define HSR_DABT_S1PTW (1U << 7) >> diff --git a/arch/arm/kvm/handle_exit.c b/arch/arm/kvm/handle_exit.c >> index df4c82d..c4c496f 100644 >> --- a/arch/arm/kvm/handle_exit.c >> +++ b/arch/arm/kvm/handle_exit.c >> @@ -84,7 +84,11 @@ static int handle_dabt_hyp(struct kvm_vcpu *vcpu, struct kvm_run *run) >> static int kvm_handle_wfi(struct kvm_vcpu *vcpu, struct kvm_run *run) >> { >> trace_kvm_wfi(*vcpu_pc(vcpu)); >> - kvm_vcpu_block(vcpu); >> + if (kvm_vcpu_get_hsr(vcpu) & HSR_WFI_IS_WFE) >> + kvm_vcpu_on_spin(vcpu); > > Could you also enable CONFIG_HAVE_KVM_CPU_RELAX_INTERCEPT for arm and > check if ple handler logic helps further? > we would ideally get one more optimization folded into ple handler if > you enable that. Just gave it a go, and the results are slightly (but consistently) worse. Over 10 runs: Without RELAX_INTERCEPT: Average run 3.3623s With RELAX_INTERCEPT: Average run 3.4226s Not massive, but still noticeable. Any clue? M. -- Jazz is not dead. It just smells funny... -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html