On Mon, Apr 09, 2018 at 11:00:40AM +0100, Marc Zyngier wrote: > On 09/04/18 10:44, Christoffer Dall wrote: > > On Fri, Apr 06, 2018 at 04:51:53PM +0100, Dave Martin wrote: > >> On Fri, Apr 06, 2018 at 04:25:57PM +0100, Marc Zyngier wrote: > >>> Hi Dave, > >>> > >>> On 06/04/18 16:01, Dave Martin wrote: > >>>> To make the lazy FPSIMD context switch trap code easier to hack on, > >>>> this patch converts it to C. > >>>> > >>>> This is not amazingly efficient, but the trap should typically only > >>>> be taken once per host context switch. > >>>> > >>>> Signed-off-by: Dave Martin <Dave.Martin@xxxxxxx> > >>>> > >>>> --- > >>>> > >>>> Since RFCv1: > >>>> > >>>> * Fix indentation to be consistent with the rest of the file. > >>>> * Add missing ! to write back to sp with attempting to push regs. > >>>> --- > >>>> arch/arm64/kvm/hyp/entry.S | 57 +++++++++++++++++---------------------------- > >>>> arch/arm64/kvm/hyp/switch.c | 24 +++++++++++++++++++ > >>>> 2 files changed, 46 insertions(+), 35 deletions(-) > >>>> > >>>> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S > >>>> index fdd1068..47c6a78 100644 > >>>> --- a/arch/arm64/kvm/hyp/entry.S > >>>> +++ b/arch/arm64/kvm/hyp/entry.S > >>>> @@ -176,41 +176,28 @@ ENTRY(__fpsimd_guest_restore) > >>>> // x1: vcpu > >>>> // x2-x29,lr: vcpu regs > >>>> // vcpu x0-x1 on the stack > >>>> - stp x2, x3, [sp, #-16]! > >>>> - stp x4, lr, [sp, #-16]! > >>>> - > >>>> -alternative_if_not ARM64_HAS_VIRT_HOST_EXTN > >>>> - mrs x2, cptr_el2 > >>>> - bic x2, x2, #CPTR_EL2_TFP > >>>> - msr cptr_el2, x2 > >>>> -alternative_else > >>>> - mrs x2, cpacr_el1 > >>>> - orr x2, x2, #CPACR_EL1_FPEN > >>>> - msr cpacr_el1, x2 > >>>> -alternative_endif > >>>> - isb > >>>> - > >>>> - mov x3, x1 > >>>> - > >>>> - ldr x0, [x3, #VCPU_HOST_CONTEXT] > >>>> - kern_hyp_va x0 > >>>> - add x0, x0, #CPU_GP_REG_OFFSET(CPU_FP_REGS) > >>>> - bl __fpsimd_save_state > >>>> - > >>>> - add x2, x3, #VCPU_CONTEXT > >>>> - add x0, x2, #CPU_GP_REG_OFFSET(CPU_FP_REGS) > >>>> - bl __fpsimd_restore_state > >>>> - > >>>> - // Skip restoring fpexc32 for AArch64 guests > >>>> - mrs x1, hcr_el2 > >>>> - tbnz x1, #HCR_RW_SHIFT, 1f > >>>> - ldr x4, [x3, #VCPU_FPEXC32_EL2] > >>>> - msr fpexc32_el2, x4 > >>>> -1: > >>>> - ldp x4, lr, [sp], #16 > >>>> - ldp x2, x3, [sp], #16 > >>>> - ldp x0, x1, [sp], #16 > >>>> - > >>>> + stp x2, x3, [sp, #-144]! > >>>> + stp x4, x5, [sp, #16] > >>>> + stp x6, x7, [sp, #32] > >>>> + stp x8, x9, [sp, #48] > >>>> + stp x10, x11, [sp, #64] > >>>> + stp x12, x13, [sp, #80] > >>>> + stp x14, x15, [sp, #96] > >>>> + stp x16, x17, [sp, #112] > >>>> + stp x18, lr, [sp, #128] > >>>> + > >>>> + bl __hyp_switch_fpsimd > >>>> + > >>>> + ldp x4, x5, [sp, #16] > >>>> + ldp x6, x7, [sp, #32] > >>>> + ldp x8, x9, [sp, #48] > >>>> + ldp x10, x11, [sp, #64] > >>>> + ldp x12, x13, [sp, #80] > >>>> + ldp x14, x15, [sp, #96] > >>>> + ldp x16, x17, [sp, #112] > >>>> + ldp x18, lr, [sp, #128] > >>>> + ldp x0, x1, [sp, #144] > >>>> + ldp x2, x3, [sp], #160 > >>> > >>> I can't say I'm overly thrilled with adding another save/restore > >>> sequence. How about treating it like a real guest exit instead? Granted, > >>> there is a bit more overhead to it, but as you pointed out above, this > >>> should be pretty rare... > >> > >> I have no objection to handling this after exiting back to > >> __kvm_vcpu_run(), provided the performance is deemed acceptable. > >> > > > > My guess is that it's going to be visible on non-VHE systems, and given > > that we're doing all of this for performance in the first place, I'm not > > exceited about that approach either. > > My rational is that, as we don't disable FP access across most > exit/entry sequences, we still significantly benefit from the optimization. > Yes, but we will take that cost every time we've blocked (and someone else used fpsimd) or every time we've returned to user space. True, that's slow anywhow, but still... > > I thought it was acceptable to do another save/restore, because it was > > only the GPRs (and equivalent to what the compiler would generate for a > > function call?) and thus not susceptible to the complexities of sysreg > > save/restores. > > Sysreg? What I meant was that this is not saving/restoring any of the system registers, which is where we've had the most changes and maintenance, but is restricted to GPRs, but anyway... > That's not what I'm proposing. What I'm proposing here is that > we treat FP exception as a shallow exit that immediately returns to the > guest without touching them. The overhead is an extra save/restore of > the host's x19-x30, if I got my maths right. I agree that this is > significant, but I'd like to measure this overhead before we go one way > or the other. ...sorry, I didn't realize it was a shallow exit you suggested. That's a different story, and that would probably be in the noise if we measured it. > > > Another alternative would be to go back to Dave's original approach of > > implementing the fpsimd state update to the host's structure in assembly > > directly, but I was having a hard time understanding that. Perhaps I > > just need to try harder. > I'd rather stick to the current C approach, no matter how we perform the > save/restore. It feels a lot more readable and maintainable in the long run. > Agreed. Thanks, -Christoffer _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm