On Wed, Jul 28, 2021, Paolo Bonzini wrote: > On 28/07/21 14:39, Vitaly Kuznetsov wrote: > > Shouldn't we also change kvm_arch_vcpu_runnable() and check > > 'kvm_vcpu_running() > 0' now? > > I think leaving kvm_vcpu_block on error is the better choice, so it should > be good with returning true if kvm_vcpu_running(vcpu) < 0. Blech. This is all gross. There is a subtle bug lurking in both Jim's approach and in this approach. It's not detected because the selftest exercises a bad PI descriptor, not a bad vAPIC page. In Jim's approach of returning 'true' from kvm_vcpu_running() if kvm_check_nested_events() fails due to vmx_complete_nested_posted_interrupt() detecting a bad vAPIC page, the resulting KVM_EXIT_INTERNAL_ERROR will be "lost" due to vmx->nested.pi_pending being cleared. KVM runs the vCPU, but skips over the PI check in inject_pending_event() due to vmx->nested.pi_pending==false. The selftest works because the bad PI descriptor case is handled _before_ pi_pending is cleared. This approach mostly fixes that bug by virtue of returning immediately in the vcpu_run() case, but if the bad vAPIC page is encountered via kvm_arch_vcpu_runnable(), KVM will effectively drop the error. This can be hack-a-fixed by pre-checking the vAPIC page. That's arguably architecturally wrong as the vAPIC emulation access shouldn't occur until after PI.ON is cleared, but from KVM's perspective I think it's the least awful "fix" given the current train wreck. Alternatively, what about punting all of this in favor of targeting the full cleanup[*] for 5.15? I believe I have the bandwidth to pick that up. [*] https://lkml.kernel.org/r/YKWI1GPdNc4shaCt@xxxxxxxxxx diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c index 0d0dd6580cfd..8d1c8217954a 100644 --- a/arch/x86/kvm/vmx/nested.c +++ b/arch/x86/kvm/vmx/nested.c @@ -3707,6 +3707,10 @@ static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) if (!vmx->nested.pi_desc) goto mmio_needed; + vapic_page = vmx->nested.virtual_apic_map.hva; + if (!vapic_page) + goto mmio_needed; + vmx->nested.pi_pending = false; if (!pi_test_and_clear_on(vmx->nested.pi_desc)) @@ -3714,10 +3718,6 @@ static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) max_irr = find_last_bit((unsigned long *)vmx->nested.pi_desc->pir, 256); if (max_irr != 256) { - vapic_page = vmx->nested.virtual_apic_map.hva; - if (!vapic_page) - goto mmio_needed; - __kvm_apic_update_irr(vmx->nested.pi_desc->pir, vapic_page, &max_irr); status = vmcs_read16(GUEST_INTR_STATUS); [*]