On Mon, 21 May 2012 11:41:31 +0200, Antonios Motakis <a.motakis at virtualopensystems.com> wrote: > On 05/21/2012 05:49 AM, Rusty Russell wrote: > > The guest may turn FPEXEC_EN off, but that doesn't mean the state isn't > > important. In particular, a (non-SMP) Linux ARM guest will turn the > > FPEXEC_EN bit off on context switch, without saving the contents. > > > > Similar logic applied to the host: we might have turned FPEXEC_EN off, > > but that doesn't mean the contents of the registers aren't worth saving. > > My first attempted implementation tried to save/restore state regardless > of FPEXC_EN. However, it turned out that when FPEXC_EN is 0, most VFP > instructions will cause an undefined instruction trap (ARM arm p. > B1-1225) so the code wouldn't work. Is the hardware supposed to preserve > state even when the FPU extensions are disabled? Yes, in fact, the kernel relies on it; it's pretty standard on most archs to be able to flip the FPU on and off to implement lazy FPU switching. The ARM Linux implementation is classic here: On context switch: (vfp_notifier() in vfpmodule.c) (SMP only): If FPEXEC_EN, save the state of the FPU to outgoing task. Disable FPEXEC_EN. On trap: Enable FPEXEC_EN. If the FPU contents are up-to-date and from the current task, return. (UP only): push the FPU state to previous task (if any) Load FPU state from current task. Anyway, here's the patch I'm working on, based on yours. It compiles, haven't run it yet :) It went through various complicated iterations yesterday, but I think this is finally the simplest we can do. It's based on your idea about working in hyp mode: we use HCPTR to stop the guest from using the FPU. If they trap, clear the HCPTR and save host & restore guest FPU state, then return back to guest. On normal exit we check if HCPTR was clear, and if so, we save and restore FPU state. This means that the FPU state is always saved and restored across __kvm_vcpu_run. So we can simply enable & restore FPEXEC_EN across that function: noone can see it. Note that in some cases, we could skip saving and restoring the host FPU state. But that seems like a premature optimization, and makes us depend on the specific Linux kernel saving behavior. commit 32390a3f9fa0c2fc7e29e1116984a216feef9a28 Author: Rusty Russell <rusty.russell at linaro.org> Date: Tue May 22 12:29:23 2012 +0930 ARM: KVM: lazy save/restore of vfp/neon (UNTESTED!) This is based on the non-lazy save/restore patch by Antonios Motakis <a.motakis at virtualopensystems.com>. In this patch, we use the Hyp Coprocessor Trap Register (HPCTR) to trap VFP/NEON instruction, and switch the FPU state at that point. To do this we need to enable the FPU (which Linux disables to implement lazy FPU save/restore) but it doesn't interfere with the rest of the kernel as we always put the FPU back the way we found it before we can be preempted. Signed-off-by: Rusty Russell <rusty.russell at linaro.org> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h index 44abdc8..71b92e6 100644 --- a/arch/arm/include/asm/kvm_host.h +++ b/arch/arm/include/asm/kvm_host.h @@ -85,12 +85,24 @@ enum cp15_regs { nr_cp15_regs }; +enum cp10_regs { + FPEXC, /* Floating Point Exception Control Register */ + FPSCR, /* Floating Point Status and Control Register */ + FPINST, /* Common VFP Subarchitecture Registers */ + FPINST2, + nr_cp10_regs +}; + struct kvm_vcpu_arch { struct kvm_vcpu_regs regs; /* System control coprocessor (cp15) */ u32 cp15[nr_cp15_regs]; + /* Floating point registers (VFP and Advanced SIMD/NEON) */ + u32 cp10[nr_cp10_regs]; + u32 cp11[64]; + /* Exception Information */ u32 hsr; /* Hyp Syndrom Register */ u32 hdfar; /* Hyp Data Fault Address Register */ diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c index c8c1b91..5bd1849 100644 --- a/arch/arm/kernel/asm-offsets.c +++ b/arch/arm/kernel/asm-offsets.c @@ -161,6 +161,8 @@ int main(void) DEFINE(VCPU_TID_URW, offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URW])); DEFINE(VCPU_TID_URO, offsetof(struct kvm_vcpu, arch.cp15[c13_TID_URO])); DEFINE(VCPU_TID_PRIV, offsetof(struct kvm_vcpu, arch.cp15[c13_TID_PRIV])); + DEFINE(VCPU_CP10, offsetof(struct kvm_vcpu, arch.cp10)); + DEFINE(VCPU_CP11, offsetof(struct kvm_vcpu, arch.cp11)); DEFINE(VCPU_REGS, offsetof(struct kvm_vcpu, arch.regs)); DEFINE(VCPU_USR_REGS, offsetof(struct kvm_vcpu, arch.regs.usr_regs)); DEFINE(VCPU_SVC_REGS, offsetof(struct kvm_vcpu, arch.regs.svc_regs)); diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 1cce7af..323b93c 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -38,6 +38,8 @@ #include <asm/kvm_asm.h> #include <asm/kvm_mmu.h> #include <asm/kvm_emulate.h> +#include <asm/vfp.h> +#include "../vfp/vfpinstr.h" // For fmrx & fmxr static DEFINE_PER_CPU(unsigned long, kvm_arm_hyp_stack_page); @@ -448,6 +450,20 @@ static inline int handle_exit(struct kvm_vcpu *vcpu, struct kvm_run *run, return arm_exit_handlers[hsr_ec](vcpu, run); } +static inline unsigned long save_and_enable_fpexc(void) +{ + unsigned long fpexc; + + fpexc = fmrx(FPEXC); + fmxr(FPEXC, fpexc | FPEXC_EN); + return fpexc; +} + +static void restore_fpexc(unsigned long fpexc) +{ + fmxr(FPEXC, fpexc); +} + /** * kvm_arch_vcpu_ioctl_run - the main VCPU run function to execute guest code * @vcpu: The VCPU pointer @@ -463,6 +479,7 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) { int ret = 0; sigset_t sigsaved; + unsigned long fpexc; if (run->exit_reason == KVM_EXIT_MMIO) { ret = kvm_handle_mmio_return(vcpu, vcpu->run); @@ -498,12 +515,16 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run) update_vttbr(vcpu->kvm); local_irq_disable(); + + fpexc = save_and_enable_fpexc(); + kvm_guest_enter(); vcpu->mode = IN_GUEST_MODE; ret = __kvm_vcpu_run(vcpu); vcpu->mode = OUTSIDE_GUEST_MODE; + restore_fpexc(fpexc); kvm_guest_exit(); local_irq_enable(); diff --git a/arch/arm/kvm/interrupts.S b/arch/arm/kvm/interrupts.S index 0cf4965..512d233 100644 --- a/arch/arm/kvm/interrupts.S +++ b/arch/arm/kvm/interrupts.S @@ -21,6 +21,7 @@ #include <asm/asm-offsets.h> #include <asm/kvm_asm.h> #include <asm/kvm_arm.h> +#include <asm/vfp.h> #define VCPU_USR_REG(_reg_nr) (VCPU_USR_REGS + (_reg_nr * 4)) #define VCPU_USR_SP (VCPU_USR_REG(13)) @@ -236,6 +237,72 @@ ENTRY(__kvm_flush_vm_context) mcr p15, 0, r11, c10, c2, 1 @ NMRR .endm +.macro store_vfp_state vcpu=0, vcpup + .if \vcpu == 0 + vpush {d0-d15} + .else + add r11, \vcpup, #VCPU_CP11 + vstm r11!, {d0-d15} + .endif + mrc p10, 7, r9, cr7, cr0, 0 @ MVFR0 + and r9, r9, #MVFR0_A_SIMD_MASK + cmp r9, #2 @ Check for 32 registers + .if \vcpu == 0 + vpusheq {d16-d31} + .else + vstmeq r11!, {d16-d31} + .endif + + mrc p10, 7, r3, cr1, cr0, 0 @ FPSCR + tst r2, #FPEXC_EX @ Check for VFP Subarchitecture + beq 1f + mrc p10, 7, r4, cr9, cr0, 0 @ FPINST + tst r2, #FPEXC_FP2V + beq 1f + mrc p10, 7, r5, cr10, cr0, 0 @ FPINST2 + +1: + .if \vcpu == 0 + push {r2-r5} + .else + add r10, \vcpup, #VCPU_CP10 + stm r10, {r2-r5} @ Save FPEXC, FPSCR, FPINST, FPINST2 + .endif +.endm + +.macro restore_vfp_state vcpu=0, vcpup + .if \vcpu == 0 + pop {r2-r5} + .else + add r10, \vcpup, #VCPU_CP10 + ldm r10, {r2-r5} @ Load FPEXC, FPSCR, FPINST, FPINST2 + .endif + + mcr p10, 7, r2, cr8, cr0, 0 @ FPEXC + mcr p10, 7, r3, cr1, cr0, 0 @ FPSCR + tst r2, #FPEXC_EX @ Check for VFP Subarchitecture + beq 1f + mcr p10, 7, r4, cr9, cr0, 0 @ FPINST + tst r2, #FPEXC_FP2V + beq 1f + mcr p10, 7, r5, cr10, cr0, 0 @ FPINST2 + +1: + .if \vcpu == 1 + add r11, \vcpup, #VCPU_CP11 + vldm r11!, {d0-d15} + .endif + mrc p10, 7, r9, cr7, cr0, 0 @ MVFR0 + and r9, r9, #MVFR0_A_SIMD_MASK + cmp r9, #2 @ Check for 32 registers + .if \vcpu == 0 + vpopeq {d16-d31} + vpop {d0-d15} + .else + vldmeq r11!, {d16-d31} + .endif +.endm + /* Configures the HSTR (Hyp System Trap Register) on entry/return * (hardware reset value is 0) */ .macro set_hstr entry @@ -298,6 +365,11 @@ ENTRY(__kvm_vcpu_run) @ Trap coprocessor CRx for all x except 2 and 14 set_hstr 1 + @ Trap floating point accesses so we can lazy restore. + mrc p15, 4, r2, c1, c1, 2 + orr r2, r2, #((1 << 10) | (1 << 11)) @ Trap cp10 and cp11 + mcr p15, 4, r2, c1, c1, 2 + @ Write standard A-9 CPU id in MIDR ldr r1, [r0, #VCPU_MIDR] mcr p15, 4, r1, c0, c0, 0 @@ -345,6 +417,19 @@ __kvm_vcpu_return: @ Don't trap coprocessor accesses for host kernel set_hstr 0 + @ Save floating point registers we if let guest use them. + mrc p15, 4, r2, c1, c1, 2 + tst r2, #((1 << 10) | (1 << 11)) + @ Don't trap VFP accesses for host kernel. + bic r2, r2, #((1 << 10) | (1 << 11)) + mcr p15, 4, r2, c1, c1, 2 + beq after_vfp_restore + + @ Switch VFP/NEON hardware state to the host's + store_vfp_state 1, r1 + restore_vfp_state + +after_vfp_restore: @ Reset Hyp-role configure_hyp_role 0, r1 @@ -532,8 +617,11 @@ guest_trap: stmia r1, {r3, r4, r5} sub r1, r1, #VCPU_USR_REG(0) - @ Check if we need the fault information lsr r2, r0, #HSR_EC_SHIFT + cmp r2, #HSR_EC_CP_0_13 + beq restore_vfp + + @ Check if we need the fault information cmp r2, #HSR_EC_IABT beq 2f cmpne r2, #HSR_EC_DABT @@ -558,6 +646,22 @@ guest_trap: 1: mov r0, #ARM_EXCEPTION_HVC b __kvm_vcpu_return +restore_vfp: + @ NEON/VFP used. Turn on VFP access. + mrc p15, 4, r2, c1, c1, 2 + bic r2, r2, #((1 << 10) | (1 << 11)) + mcr p15, 4, r2, c1, c1, 2 + + @ Push host vfp state onto stack, restore guest from VCPU ptr. + @ On a normal exit, we will pop host VFP state in __kvm_vcpu_return. + store_vfp_state + restore_vfp_state 1, r1 + + @ We just need to restore guest regs, then return to guest. + add r0, r1, #VCPU_USR_REG(0) + ldmia r0, {r0-r12} + eret + .align hyp_irq: push {r0}