2017-11-10 18:06+0100, Paolo Bonzini: > On 09/11/2017 19:27, Liran Alon wrote: > > Consider the following scenario: > > 1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI > > to CPU B via virtual posted-interrupt mechanism. > > 2. CPU B is currently executing L2 guest. > > 3. vmx_deliver_nested_posted_interrupt() calls > > kvm_vcpu_trigger_posted_interrupt() which will note that > > vcpu->mode == IN_GUEST_MODE. > > 4. Assume that before CPU A sends the physical POSTED_INTR_NESTED_VECTOR > > IPI, CPU B exits from L2 to L0 during event-delivery > > (valid IDT-vectoring-info). > > 5. CPU A now sends the physical IPI. The IPI is received in host and > > it's handler (smp_kvm_posted_intr_nested_ipi()) does nothing. > > 6. Assume that before CPU A sets pi_pending=true and KVM_REQ_EVENT, > > CPU B continues to run in L0 and reach vcpu_enter_guest(). As > > KVM_REQ_EVENT is not set yet, vcpu_enter_guest() will continue and resume > > L2 guest. > > 7. At this point, CPU A sets pi_pending=true and KVM_REQ_EVENT but > > it's too late! CPU B already entered L2 and KVM_REQ_EVENT will only be > > consumed at next L2 entry! > > The bug is real (great debugging!) but I think the fix is wrong. The > basic issue is that we're not kicking the VCPU, so this should also fix it: > > /* the PIR and ON have been set by L1. */ > if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) { This would still fail on the exiting case. If one VCPU was just after a VM exit, then the sender would see it IN_GUEST_MODE, send the posted notification and return true, but the notification would do nothing and we didn't set vmx->nested.pi_pending = true, so the interrupt would not get handled until the next posted notification or nested VM entry. > /* > * If a posted intr is not recognized by hardware, > * we will accomplish it in the next vmentry. > */ > vmx->nested.pi_pending = true; > kvm_make_request(KVM_REQ_EVENT, vcpu); > kvm_vcpu_kick(vcpu); > } > > See the comments around the setting of IN_GUEST_MODE, introduced by > commit b95234c84004 ("kvm: x86: do not use KVM_REQ_EVENT for APICv > interrupt injection", 2017-02-15). > > Even though nested PI must use KVM_REQ_EVENT, the reasoning behind the > ordering of cli and vcpu->mode = IN_GUEST_MODE should hold in both cases. I think the fix is correct, but the code is using very awkward synchronization through vmx->nested.pi_pending and slaps KVM_REQ_EVENT on top. If we checked vmx->nested.pi_pending after cli, we could get rid of KVM_REQ_EVENT and if we want to support nested posted interrupts from PCI devices, we should explicitly check for pi.on during VM entry. Kicks do not seem necessary as the only case where we need the kick is the same case where kvm_vcpu_trigger_posted_interrupt() should just deliver the interrupt.