Reported-by: Mihai Caraman <mihai.caraman@xxxxxxxxxxxxx> Tested-by: Mihai Caraman <mihai.caraman@xxxxxxxxxxxxx> 40% improvements here and there will make the difference. Thanks, Mike > -----Original Message----- > From: kvmarm-bounces@xxxxxxxxxxxxxxxxxxxxx [mailto:kvmarm-bounces@xxxxxxxxxxxxxxxxxxxxx] On Behalf Of Marc Zyngier > Sent: Wednesday, February 17, 2016 6:41 PM > To: Christoffer Dall <christoffer.dall@xxxxxxxxxx> > Cc: kvm@xxxxxxxxxxxxxxx; linux-arm-kernel@xxxxxxxxxxxxxxxxxxx; kvmarm@xxxxxxxxxxxxxxxxxxxxx > Subject: [PATCH v2 00/17] KVM/ARM: Guest Entry/Exit optimizations > > I've recently been looking at our entry/exit costs, and profiling figures did show some very low hanging fruits. > > The most obvious cost is that accessing the GIC HW is slow. As in "deadly slow", specially when GICv2 is involved. So not hammering the HW when there is nothing to write (and even to read) is immediately beneficial, as this is the most common cases (whatever people seem to think, interrupts are a *rare* event). Similar work has also been done for GICv3, with a reduced impact (it was less "bad" to start with). > > Another easy thing to fix is the way we handle trapped system registers. We do insist on (mostly) sorting them, but we do perform a linear search on trap. We can switch to a binary search for free, and get immediate benefits (the PMU code, being extremely trap-happy, benefits immediately from this). > > With these in place, I see an improvement of 10 to 40% (depending on the platform) on our world-switch cycle count when running a set of hand-crafted guests that are designed to only perform traps. > > Please note that VM exits are actually a rare event on ARM. So don't expect your guest to be 40% faster, this will hardly make a noticable difference. > > Methodology: > > * NULL-hypercall guest: Perform 2^20 PSCI_0_2_FN_PSCI_VERSION calls, and then a power-off: > > __start: > mov x19, #(1 << 16) > 1: mov x0, #0x84000000 > hvc #0 > sub x19, x19, #1 > cbnz x19, 1b > mov x0, #0x84000000 > add x0, x0, #9 > hvc #0 > b . > > * Self IPI guest: Inject and handle 2^20 SGI0 using GICv2 or GICv3, and then power-off: > > __start: > mov x19, #(1 << 20) > > mrs x0, id_aa64pfr0_el1 > ubfx x0, x0, #24, #4 > and x0, x0, #0xf > cbz x0, do_v2 > > mrs x0, s3_0_c12_c12_5 // ICC_SRE_EL1 > and x0, x0, #1 // SRE bit > cbnz x0, do_v3 > > do_v2: > mov x0, #0x3fff0000 // Dist > mov x1, #0x3ffd0000 // CPU > mov w2, #1 > str w2, [x0] // Enable Group0 > ldr w2, =0xa0a0a0a0 > str w2, [x0, 0x400] // A0 priority for SGI0-3 > mov w2, #0x0f > str w2, [x0, #0x100] // Enable SGI0-3 > mov w2, #0xf0 > str w2, [x1, #4] // PMR > mov w2, #1 > str w2, [x1] // Enable CPU interface > > 1: > mov w2, #(2 << 24) // Interrupt self with SGI0 > str w2, [x0, #0xf00] > > 2: ldr w2, [x1, #0x0c] // GICC_IAR > cmp w2, #0x3ff > b.ne 3f > > wfi > b 2b > > 3: str w2, [x1, #0x10] // EOI > > sub x19, x19, #1 > cbnz x19, 1b > > die: > mov x0, #0x84000000 > add x0, x0, #9 > hvc #0 > b . > > do_v3: > mov x0, #0x3fff0000 // Dist > mov x1, #0x3fbf0000 // Redist 0 > mov x2, #0x10000 > add x1, x1, x2 // SGI page > mov w2, #2 > str w2, [x0] // Enable Group1 > ldr w2, =0xa0a0a0a0 > str w2, [x1, 0x400] // A0 priority for SGI0-3 > mov w2, #0x0f > str w2, [x1, #0x100] // Enable SGI0-3 > mov w2, #0xf0 > msr S3_0_c4_c6_0, x2 // PMR > mov w2, #1 > msr S3_0_C12_C12_7, x2 // Enable Group1 > > 1: > mov x2, #1 > msr S3_0_c12_c11_5, x2 // Self SGI0 > > 2: mrs x2, S3_0_c12_c12_0 // Read IAR1 > cmp w2, #0x3ff > b.ne 3f > > wfi > b 2b > > 3: msr S3_0_c12_c12_1, x2 // EOI > > sub x19, x19, #1 > cbnz x19, 1b > > b die > > * sysreg trap guest: Perform 2^20 PMSELR_EL0 accesses, and power-off: > > __start: > mov x19, #(1 << 20) > 1: mrs x0, PMSELR_EL0 > sub x19, x19, #1 > cbnz x19, 1b > mov x0, #0x84000000 > add x0, x0, #9 > hvc #0 > b . > > * These guests are profiled using perf and kvmtool: > > taskset -c 1 perf stat -e cycles:kh lkvm run -c1 --kernel do_sysreg.bin 2>&1 >/dev/null| grep cycles > > The result is then divided by the number of iterations (2^20). > > These tests have been run on three different platform (two GICv2 based, and one with GICv3 and legacy mode) and shown significant improvements in all cases. I've only touched the arm64 GIC code, but obviously the 32bit code should use it as well once we've migrated it to C. > > Vanilla v4.5-rc4 > A B C-v2 C-v3 > Null HVC: 8462 6566 6572 6505 > Self SGI: 11961 8690 9541 8629 > SysReg: 8952 6979 7212 7180 > > Patched v4.5-rc4 > A B C-v2 C-v3 > Null HVC: 5219 -38% 3957 -39% 5175 -21% 5158 -20% > Self SGI: 8946 -25% 6658 -23% 8547 -10% 7299 -15% > SysReg: 5314 -40% 4190 -40% 5417 -25% 5414 -24% > > I've pushed out a branch (kvm-arm64/suck-less) to the usual location, based on -rc4 + a few fixes I also posted today. > > Thanks, > > M. > > * From v1: > - Fixed a nasty bug dealing with the active Priority Register > - Maintenance interrupt lazy saving > - More LR hackery > - Adapted most of the series for GICv3 as well > > Marc Zyngier (17): > arm64: KVM: Switch the sys_reg search to be a binary search > ARM: KVM: Properly sort the invariant table > ARM: KVM: Enforce sorting of all CP tables > ARM: KVM: Rename struct coproc_reg::is_64 to is_64bit > ARM: KVM: Switch the CP reg search to be a binary search > KVM: arm/arm64: timer: Add active state caching > arm64: KVM: vgic-v2: Avoid accessing GICH registers > arm64: KVM: vgic-v2: Save maintenance interrupt state only if required > arm64: KVM: vgic-v2: Move GICH_ELRSR saving to its own function > arm64: KVM: vgic-v2: Do not save an LR known to be empty > arm64: KVM: vgic-v2: Only wipe LRs on vcpu exit > arm64: KVM: vgic-v2: Make GICD_SGIR quicker to hit > arm64: KVM: vgic-v3: Avoid accessing ICH registers > arm64: KVM: vgic-v3: Save maintenance interrupt state only if required > arm64: KVM: vgic-v3: Do not save an LR known to be empty > arm64: KVM: vgic-v3: Only wipe LRs on vcpu exit > arm64: KVM: vgic-v3: Do not save ICH_AP0Rn_EL2 for GICv2 emulation > > arch/arm/kvm/arm.c | 1 + > arch/arm/kvm/coproc.c | 74 +++++---- > arch/arm/kvm/coproc.h | 8 +- > arch/arm64/kvm/hyp/vgic-v2-sr.c | 144 +++++++++++++---- arch/arm64/kvm/hyp/vgic-v3-sr.c | 333 ++++++++++++++++++++++++++-------------- > arch/arm64/kvm/sys_regs.c | 40 ++--- > include/kvm/arm_arch_timer.h | 5 + > include/kvm/arm_vgic.h | 8 +- > virt/kvm/arm/arch_timer.c | 31 ++++ > virt/kvm/arm/vgic-v2-emul.c | 10 +- > virt/kvm/arm/vgic-v3.c | 4 +- > 11 files changed, 452 insertions(+), 206 deletions(-) > > -- > 2.1.4 > > _______________________________________________ > kvmarm mailing list > kvmarm@xxxxxxxxxxxxxxxxxxxxx > https://lists.cs.columbia.edu/mailman/listinfo/kvmarm > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html