On Thu, May 20, 2021, Jim Mattson wrote: > Don't allow posted interrupts to modify a stale posted interrupt > descriptor (including the initial value of 0). > > Empirical tests on real hardware reveal that a posted interrupt > descriptor referencing an unbacked address has PCI bus error semantics > (reads as all 1's; writes are ignored). However, kvm can't distinguish > unbacked addresses from device-backed (MMIO) addresses, so it should > really ask userspace for an MMIO completion. That's overly > complicated, so just punt with KVM_INTERNAL_ERROR. > > Don't return the error until the posted interrupt descriptor is > actually accessed. We don't want to break the existing kvm-unit-tests > that assume they can launch an L2 VM with a posted interrupt > descriptor that references MMIO space in L1. > > Fixes: 6beb7bd52e48 ("kvm: nVMX: Refactor nested_get_vmcs12_pages()") > Signed-off-by: Jim Mattson <jmattson@xxxxxxxxxx> > --- > arch/x86/kvm/vmx/nested.c | 15 ++++++++++++++- > 1 file changed, 14 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c > index 706c31821362..defd42201bb4 100644 > --- a/arch/x86/kvm/vmx/nested.c > +++ b/arch/x86/kvm/vmx/nested.c > @@ -3175,6 +3175,15 @@ static bool nested_get_vmcs12_pages(struct kvm_vcpu *vcpu) > offset_in_page(vmcs12->posted_intr_desc_addr)); > vmcs_write64(POSTED_INTR_DESC_ADDR, > pfn_to_hpa(map->pfn) + offset_in_page(vmcs12->posted_intr_desc_addr)); > + } else { > + /* > + * Defer the KVM_INTERNAL_ERROR exit until > + * someone tries to trigger posted interrupt > + * processing on this vCPU, to avoid breaking > + * existing kvm-unit-tests. Run the lines out to 80 chars. Also, can we change the comment to tie it to CPU behavior in someway? A few years down the road, "existing kvm-unit-tests" may not have any relevant meaning, and it's not like kvm-unit-tests is bug free either. E.g. something like /* * Defer the KVM_INTERNAL_ERROR exit until posted * interrupt processing actually occurs on this vCPU. * Until that happens, the descriptor is not accessed, * and userspace can technically rely on that behavior. */ > + */ > + vmx->nested.pi_desc = NULL; > + pin_controls_clearbit(vmx, PIN_BASED_POSTED_INTR); > } > } > if (nested_vmx_prepare_msr_bitmap(vcpu, vmcs12)) > @@ -3689,10 +3698,14 @@ static int vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) > void *vapic_page; > u16 status; > > - if (!vmx->nested.pi_desc || !vmx->nested.pi_pending) > + if (!vmx->nested.pi_pending) > return 0; > > + if (!vmx->nested.pi_desc) > + goto mmio_needed; > + > vmx->nested.pi_pending = false; > + > if (!pi_test_and_clear_on(vmx->nested.pi_desc)) > return 0; > > -- > 2.31.1.818.g46aad6cb9e-goog >