Excerpts from Fabiano Rosas's message of March 24, 2021 8:57 am: > Nicholas Piggin <npiggin@xxxxxxxxx> writes: > >> In the interest of minimising the amount of code that is run in >> "real-mode", don't handle hcalls in real mode in the P9 path. >> >> POWER8 and earlier are much more expensive to exit from HV real mode >> and switch to host mode, because on those processors HV interrupts get >> to the hypervisor with the MMU off, and the other threads in the core >> need to be pulled out of the guest, and SLBs all need to be saved, >> ERATs invalidated, and host SLB reloaded before the MMU is re-enabled >> in host mode. Hash guests also require a lot of hcalls to run. The >> XICS interrupt controller requires hcalls to run. >> >> By contrast, POWER9 has independent thread switching, and in radix mode >> the hypervisor is already in a host virtual memory mode when the HV >> interrupt is taken. Radix + xive guests don't need hcalls to handle >> interrupts or manage translations. >> >> So it's much less important to handle hcalls in real mode in P9. >> >> Signed-off-by: Nicholas Piggin <npiggin@xxxxxxxxx> >> --- > > <snip> > >> diff --git a/arch/powerpc/kvm/book3s_hv.c b/arch/powerpc/kvm/book3s_hv.c >> index fa7614c37e08..17739aaee3d8 100644 >> --- a/arch/powerpc/kvm/book3s_hv.c >> +++ b/arch/powerpc/kvm/book3s_hv.c >> @@ -1142,12 +1142,13 @@ int kvmppc_pseries_do_hcall(struct kvm_vcpu *vcpu) >> } >> >> /* >> - * Handle H_CEDE in the nested virtualization case where we haven't >> - * called the real-mode hcall handlers in book3s_hv_rmhandlers.S. >> + * Handle H_CEDE in the P9 path where we don't call the real-mode hcall >> + * handlers in book3s_hv_rmhandlers.S. >> + * >> * This has to be done early, not in kvmppc_pseries_do_hcall(), so >> * that the cede logic in kvmppc_run_single_vcpu() works properly. >> */ >> -static void kvmppc_nested_cede(struct kvm_vcpu *vcpu) >> +static void kvmppc_cede(struct kvm_vcpu *vcpu) >> { >> vcpu->arch.shregs.msr |= MSR_EE; >> vcpu->arch.ceded = 1; >> @@ -1403,9 +1404,15 @@ static int kvmppc_handle_exit_hv(struct kvm_vcpu *vcpu, >> /* hcall - punt to userspace */ >> int i; >> >> - /* hypercall with MSR_PR has already been handled in rmode, >> - * and never reaches here. >> - */ >> + if (unlikely(vcpu->arch.shregs.msr & MSR_PR)) { >> + /* >> + * Guest userspace executed sc 1, reflect it back as a >> + * privileged program check interrupt. >> + */ >> + kvmppc_core_queue_program(vcpu, SRR1_PROGPRIV); >> + r = RESUME_GUEST; >> + break; >> + } > > This patch bypasses sc_1_fast_return so it breaks KVM-PR. L1 loops with > the following output: > > [ 9.503929][ T3443] Couldn't emulate instruction 0x4e800020 (op 19 xop 16) > [ 9.503990][ T3443] kvmppc_exit_pr_progint: emulation at 48f4 failed (4e800020) > [ 9.504080][ T3443] Couldn't emulate instruction 0x4e800020 (op 19 xop 16) > [ 9.504170][ T3443] kvmppc_exit_pr_progint: emulation at 48f4 failed (4e800020) > > 0x4e800020 is a blr after a sc 1 in SLOF. > > For KVM-PR we need to inject a 0xc00 at some point, either here or > before branching to no_try_real in book3s_hv_rmhandlers.S. Ah, I didn't know about that PR KVM (I suppose I should test it but I haven't been able to get it running in the past). Should be able to deal with that. This patch probably shouldn't change the syscall behaviour like this anyway. Thanks, Nick