Dave Martin <Dave.Martin@xxxxxxx> writes: <snip> >> >> > @@ -404,10 +444,11 @@ static bool __hyp_text fixup_guest_exit(struct kvm_vcpu *vcpu, u64 *exit_code) >> >> > * and restore the guest context lazily. >> >> > * If FP/SIMD is not implemented, handle the trap and inject an >> >> > * undefined instruction exception to the guest. >> >> > + * Similarly for trapped SVE accesses. >> >> > */ >> >> > - if (system_supports_fpsimd() && >> >> > - kvm_vcpu_trap_get_class(vcpu) == ESR_ELx_EC_FP_ASIMD) >> >> > - return __hyp_switch_fpsimd(vcpu); >> >> > + guest_has_sve = vcpu_has_sve(vcpu); >> >> >> >> I'm not sure if it's worth fishing this out here given you are already >> >> passing vcpu down the chain. >> > >> > I wanted to discourage GCC from recomputing this. If you're in a >> > position to do so, can you look at the disassembly with/without this >> > factored out and see whether it makes a difference? >> >> Hmm it is hard to tell. There is code motion but for some reason I'm >> seeing the static jump code unrolled, for example (original on left): >> >> __hyp_switch_fpsimd(): __hyp_switch_fpsimd(): >> /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:382 | /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:381 >> > ----: tst w0, #0x400000 >> > ----: b.eq 22c <fixup_guest_exit+0x1a4> // b.none >> > arch_static_branch_jump(): >> > /home/alex/lsrc/kvm/linux.git/arch/arm64/include/asm/jump_label.h:45 >> > ----: b 38c <fixup_guest_exit+0x304> >> > arch_static_branch(): >> > /home/alex/lsrc/kvm/linux.git/arch/arm64/include/asm/jump_label.h:31 >> > ----: nop >> > ----: b 22c <fixup_guest_exit+0x1a4> >> > test_bit(): >> > /home/alex/lsrc/kvm/linux.git/include/asm-generic/bitops/non-atomic.h:106 >> > ----: adrp x0, 0 <cpu_hwcaps> >> > ----: ldr x0, [x0] >> > __hyp_switch_fpsimd(): >> > /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:381 >> ----: tst w0, #0x400000 ----: tst w0, #0x400000 >> ----: b.eq 238 <fixup_guest_exit+0x1b0> // b.none | ----: b.eq 22c <fixup_guest_exit+0x1a4> // b.none >> ----: cbz w21, 238 <fixup_guest_exit+0x1b0> | ----: tbz w2, #5, 22c <fixup_guest_exit+0x1a4> >> /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:383 | /home/alex/lsrc/kvm/linux.git/arch/arm64/kvm/hyp/switch.----:382 >> ----: ldr w2, [x19, #2040] | ----: ldr w2, [x20, #2040] >> ----: add x1, x19, #0x4b0 | ----: add x1, x20, #0x4b0 >> ----: ldr x0, [x19, #2032] | ----: ldr x0, [x20, #2032] >> sve_ffr_offset(): sve_ffr_offset(): >> >> Put calculating guest_has_sve at the top of __hyp_switch_fpsimd make >> most of that go away and just moves things around a little bit. So I >> guess it could makes sense for the fast(ish) path although I'd be >> interested in knowing if it made any real difference to the numbers. >> After all the first read should be well cached and moving it through the >> stack is just additional memory and register pressure. > > Hmmm, I will have a think about this when I respin. > > Explicitly caching guest_has_sve() does reduce the compiler's freedom to > optimise. > > We might be able to mark it as __pure or __attribute_const__ to enable > the compiler to decide whether to cache the result, but this may not be > 100% safe. > > Part of me would prefer to leave things as they are to avoid the risk of > breaking the code again... Given that the only place you call __hyp_switch_fpsimd is here you could just roll in into __hyp_trap_is_fpsimd and have: if (__hyp_trap_is_fpsimd(vcpu)) return true; -- Alex Bennée _______________________________________________ kvmarm mailing list kvmarm@xxxxxxxxxxxxxxxxxxxxx https://lists.cs.columbia.edu/mailman/listinfo/kvmarm