On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote: > On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote: > > Hi Dave, > > > > On 06/04/18 16:01, Dave Martin wrote: > > > To make the lazy FPSIMD context switch trap code easier to hack on, > > > this patch converts it to C. > > > > > > This is not amazingly efficient, but the trap should typically only > > > be taken once per host context switch. > > > > > > Signed-off-by: Dave Martin <Dave.Martin@xxxxxxx> > > > > > > --- > > > > > > Since RFCv1: > > > > > > * Fix indentation to be consistent with the rest of the file. > > > * Add missing ! to write back to sp with attempting to push regs. > > > --- > > > arch/arm64/kvm/hyp/entry.S | 57 +++++++++++++++++---------------------------- > > > arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++ > > > 2 files changed, 46 insertions(+), 35 deletions(-) > > > > > > diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S > > > index fdd1068..47c6a78 100644 > > > --- a/arch/arm64/kvm/hyp/entry.S > > > +++ b/arch/arm64/kvm/hyp/entry.S > > > @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore) > > > // x1: vcpu > > > // x2-x29,lr: vcpu regs > > > // vcpu x0-x1 on the stack > > > - stp x2, x3, [sp, #-16]! > > > - stp x4, lr, [sp, #-16]! > > > - > > > -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN > > > - mrs x2, cptr_el2 > > > - bic x2, x2, #CPTR_EL2_TFP > > > - msr cptr_el2, x2 > > > -alternative_else > > > - mrs x2, cpacr_el1 > > > - orr x2, x2, #CPACR_EL1_FPEN > > > - msr cpacr_el1, x2 > > > -alternative_endif > > > - isb > > > - > > > - mov x3, x1 > > > - > > > - ldr x0, [x3, #VCPU_HOST_CONTEXT] > > > - kern_hyp_va x0 > > > - add x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS) > > > - bl __fpsimd_save_state > > > - > > > - add x2, x3, #VCPU_CONTEXT > > > - add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS) > > > - bl __fpsimd_restore_state > > > - > > > - // Skip restoring fpexc32 for AArch64 guests > > > - mrs x1, hcr_el2 > > > - tbnz x1, #HCR_RW_SHIFT, 1f > > > - ldr x4, [x3, #VCPU_FPEXC32_EL2] > > > - msr fpexc32_el2, x4 > > > -1: > > > - ldp x4, lr, [sp], #16 > > > - ldp x2, x3, [sp], #16 > > > - ldp x0, x1, [sp], #16 > > > - > > > + stp x2, x3, [sp, #-144]! > > > + stp x4, x5, [sp, #16] > > > + stp x6, x7, [sp, #32] > > > + stp x8, x9, [sp, #48] > > > + stp x10, x11, [sp, #64] > > > + stp x12, x13, [sp, #80] > > > + stp x14, x15, [sp, #96] > > > + stp x16, x17, [sp, #112] > > > + stp x18, lr, [sp, #128] > > > + > > > + bl __hyp_switch_fpsimd > > > + > > > + ldp x4, x5, [sp, #16] > > > + ldp x6, x7, [sp, #32] > > > + ldp x8, x9, [sp, #48] > > > + ldp x10, x11, [sp, #64] > > > + ldp x12, x13, [sp, #80] > > > + ldp x14, x15, [sp, #96] > > > + ldp x16, x17, [sp, #112] > > > + ldp x18, lr, [sp, #128] > > > + ldp x0, x1, [sp, #144] > > > + ldp x2, x3, [sp], #160 > > > > I can't say I'm overly thrilled with adding another save/restore > > sequence. How about treating it like a real guest exit instead? Granted, > > there is a bit more overhead to it, but as you pointed out above, this > > should be pretty rare... > > I have no objection to handling this after exiting back to > __kvm_vcpu_run(), provided the performance is deemed acceptable. > My guess is that it's going to be visible on non-VHE systems, and given that we're doing all of this for performance in the first place, I'm not exceited about that approach either. I thought it was acceptable to do another save/restore, because it was only the GPRs (and equivalent to what the compiler would generate for a function call?) and thus not susceptible to the complexities of sysreg save/restores. Another alternative would be to go back to Dave's original approach of implementing the fpsimd state update to the host's structure in assembly directly, but I was having a hard time understanding that. Perhaps I just need to try harder. Thoughts? Thanks, -Christoffer _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm