This short patch series combines the previous armv7 and armv8 versions. For an FP and lmbench load it reduces fp/simd context switch from 30-50% down to 2%. Results will vary with load but is no worse then current approach. In summary current lazy vfp/simd implementation switches hardware context only on guest access and again on exit to host, otherwise hardware context is skipped. This patch set builds on that functionality and executes a hardware context switch only when vCPU is scheduled out or returns to user space. Patches were tested on FVP sw platform. FP crunching applications summing up values, with outcome compared to known result were executed on several guests, and host. The test can be found here, https://github.com/mjsmar/arm-arm64-fpsimd-test Tests executed 24 hours. armv7 test: - On host executed 12 fp crunching applications - used taskset to bind - Two guests - with 12 fp crunching processes - used taskset to bind - half ran with 1ms sleep, remaining with no sleep armv8 test: - same as above except used mix of armv7 and armv8 guests. Every so often injected a fault (via proc file entry) and mismatch between expected and crunched summed value was reported. The FP crunch processes could continue to run but with bad results. Looked at 'paranoia.c' - appears like a comprehensive hardware FP precision/behavior test. It will test various behaviors and may fail having nothing to do with world switch of fp/simd - - Adequacy of guard digits for Mult., Div. and Subt. - UnderflowThreshold = an underflow threshold. - V = an overflow threshold, roughly. ... With outcomes like - - Smallest strictly positive number found is E0 = 4.94066e-324 - Searching for Overflow threshold: This may generate an error. ... Personally don't understand everything it's dong. Opted to use the simple tst-float executable. These patches are based on earlier arm64 fp/simd optimization work - https://lists.cs.columbia.edu/pipermail/kvmarm/2015-July/015748.html And subsequent fixes by Marc and Christoffer at KVM Forum hackathon to handle 32-bit guest on 64 bit host - https://lists.cs.columbia.edu/pipermail/kvmarm/2015-August/016128.html Changes since v2->v3: - combined arm v7 and v8 into one short patch series - moved access to fpexec_el2 back to EL2 - Move host restore to EL1 from EL2 and call directly from host - optimize trap enable code - renamed some variables to match usage Changes since v1->v2: - Fixed vfp/simd trap configuration to enable trace trapping - Removed set_hcptr branch label - Fixed handling of FPEXC to restore guest and host versions on vcpu_put - Tested arm32/arm64 - rebased to 4.3-rc2 - changed a couple register accesses from 64 to 32 bit Mario Smarduch (3): hooks for armv7 fp/simd lazy switch support enable enhanced armv7 fp/simd lazy switch enable enhanced armv8 fp/simd lazy switch arch/arm/include/asm/kvm_host.h | 7 +++++ arch/arm/kernel/asm-offsets.c | 2 ++ arch/arm/kvm/arm.c | 6 ++++ arch/arm/kvm/interrupts.S | 60 ++++++++++++++++++++++++++++----------- arch/arm/kvm/interrupts_head.S | 14 +++++---- arch/arm64/include/asm/kvm_host.h | 4 +++ arch/arm64/kernel/asm-offsets.c | 1 + arch/arm64/kvm/hyp.S | 37 ++++++++++++++++++++---- 8 files changed, 103 insertions(+), 28 deletions(-) -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html