Excerpts from Fabiano Rosas's message of March 18, 2021 2:22 am: > Nicholas Piggin <npiggin@xxxxxxxxx> writes: > >> In the interest of minimising the amount of code that is run in >> "real-mode", don't handle hcalls in real mode in the P9 path. >> >> POWER8 and earlier are much more expensive to exit from HV real mode >> and switch to host mode, because on those processors HV interrupts get >> to the hypervisor with the MMU off, and the other threads in the core >> need to be pulled out of the guest, and SLBs all need to be saved, >> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled >> in host mode. Hash guests also require a lot of hcalls to run. The >> XICS interrupt controller requires hcalls to run. >> >> By contrast, POWER9 has independent thread switching, and in radix mode >> the hypervisor is already in a host virtual memory mode when the HV >> interrupt is taken. Radix + xive guests don't need hcalls to handle >> interrupts or manage translations. >> >> So it's much less important to handle hcalls in real mode in P9. >> >> Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx> >> --- > > <snip> > >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index 497f216ad724..1f2ba8955c6a 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -1147,7 +1147,7 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) >> * This has to be done early, not in kvmppc_pseries_do_hcall(), so >> * that the cede logic in kvmppc_run_single_vcpu() works properly. >> */ >> -static void kvmppc_nested_cede(struct kvm_vcpu *vcpu) >> +static void kvmppc_cede(struct kvm_vcpu *vcpu) > > The comment above needs to be updated I think. > >> { >> vcpu->arch.shregs.msr |= MSR_EE; >> vcpu->arch.ceded = 1; >> @@ -1403,9 +1403,15 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu, >> /* hcall - punt to userspace */ >> int i; >> >> - /* hypercall with MSR_PR has already been handled in rmode, >> - * and never reaches here. >> - */ >> + if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) { >> + /* >> + * Guest userspace executed sc 1, reflect it back as a >> + * privileged program check interrupt. >> + */ >> + kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV); >> + r = RESUME_GUEST; >> + break; >> + } >> >> run->papr_hcall.nr = kvmppc_get_gpr(vcpu, 3); >> for (i = 0; i < 9; ++i) >> @@ -3740,15 +3746,36 @@ static int kvmhv_p9_guest_entry(struct kvm_vcpu *vcpu, u64 time_limit, >> /* H_CEDE has to be handled now, not later */ >> if (trap == BOOK3S_INTERRUPT_SYSCALL && !vcpu->arch.nested && >> kvmppc_get_gpr(vcpu, 3) == H_CEDE) { >> - kvmppc_nested_cede(vcpu); >> + kvmppc_cede(vcpu); >> kvmppc_set_gpr(vcpu, 3, 0); >> trap = 0; >> } >> } else { >> kvmppc_xive_push_vcpu(vcpu); >> trap = kvmhv_load_hv_regs_and_go(vcpu, time_limit, lpcr); >> - kvmppc_xive_pull_vcpu(vcpu); >> + /* H_CEDE has to be handled now, not later */ >> + /* XICS hcalls must be handled before xive is pulled */ >> + if (trap == BOOK3S_INTERRUPT_SYSCALL && >> + !(vcpu->arch.shregs.msr & MSR_PR)) { >> + unsigned long req = kvmppc_get_gpr(vcpu, 3); >> >> + if (req == H_CEDE) { >> + kvmppc_cede(vcpu); >> + kvmppc_xive_cede_vcpu(vcpu); /* may un-cede */ >> + kvmppc_set_gpr(vcpu, 3, 0); >> + trap = 0; >> + } >> + if (req == H_EOI || req == H_CPPR || >> + req == H_IPI || req == H_IPOLL || >> + req == H_XIRR || req == H_XIRR_X) { >> + unsigned long ret; >> + >> + ret = kvmppc_xive_xics_hcall(vcpu, req); >> + kvmppc_set_gpr(vcpu, 3, ret); >> + trap = 0; >> + } >> + } > > I tried running L2 with xive=off and this code slows down the boot > considerably. I think we're missing a !vcpu->arch.nested in the > conditional. You might be right, the real mode handlers never run if nested is set so none of these should run I think. > > This may also be missing these checks from kvmppc_pseries_do_hcall: > > if (kvmppc_xics_enabled(vcpu)) { > if (xics_on_xive()) { > ret = H_NOT_AVAILABLE; > return RESUME_GUEST; > } > ret = kvmppc_xics_hcall(vcpu, req); > (...) Well this is the formerly real-mode part of the hcall, whereas pseries_do_hcall is the virt-mode handler so it expects the real mode has already run. Hmm, probably it shouldn't be setting trap = 0 if it did not handle the hcall. I don't know if that's the problem you have or if it's the nested test but probably should test for this anyway. > For H_CEDE there might be a similar situation since we're shadowing the > code above that runs after H_ENTER_NESTED by setting trap to 0 here. Yes. Thanks, Nick