On 10/11/17 20:06, Radim Krčmář wrote:
2017-11-10 18:06+0100, Paolo Bonzini:
On 09/11/2017 19:27, Liran Alon wrote:
Consider the following scenario:
1. CPU A calls vmx_deliver_nested_posted_interrupt() to send an IPI
to CPU B via virtual posted-interrupt mechanism.
2. CPU B is currently executing L2 guest.
3. vmx_deliver_nested_posted_interrupt() calls
kvm_vcpu_trigger_posted_interrupt() which will note that
vcpu->mode == IN_GUEST_MODE.
4. Assume that before CPU A sends the physical POSTED_INTR_NESTED_VECTOR
IPI, CPU B exits from L2 to L0 during event-delivery
(valid IDT-vectoring-info).
5. CPU A now sends the physical IPI. The IPI is received in host and
it's handler (smp_kvm_posted_intr_nested_ipi()) does nothing.
6. Assume that before CPU A sets pi_pending=true and KVM_REQ_EVENT,
CPU B continues to run in L0 and reach vcpu_enter_guest(). As
KVM_REQ_EVENT is not set yet, vcpu_enter_guest() will continue and resume
L2 guest.
7. At this point, CPU A sets pi_pending=true and KVM_REQ_EVENT but
it's too late! CPU B already entered L2 and KVM_REQ_EVENT will only be
consumed at next L2 entry!
The bug is real (great debugging!) but I think the fix is wrong. The
basic issue is that we're not kicking the VCPU, so this should also fix it:
/* the PIR and ON have been set by L1. */
if (!kvm_vcpu_trigger_posted_interrupt(vcpu, true)) {
This would still fail on the exiting case.
If one VCPU was just after a VM exit, then the sender would see it
IN_GUEST_MODE, send the posted notification and return true, but the
notification would do nothing and we didn't set vmx->nested.pi_pending =
true, so the interrupt would not get handled until the next posted
notification or nested VM entry.
I agree. I was about to write to Paolo the exact same thing.
/*
* If a posted intr is not recognized by hardware,
* we will accomplish it in the next vmentry.
*/
vmx->nested.pi_pending = true;
kvm_make_request(KVM_REQ_EVENT, vcpu);
kvm_vcpu_kick(vcpu);
}
See the comments around the setting of IN_GUEST_MODE, introduced by
commit b95234c84004 ("kvm: x86: do not use KVM_REQ_EVENT for APICv
interrupt injection", 2017-02-15).
Even though nested PI must use KVM_REQ_EVENT, the reasoning behind the
ordering of cli and vcpu->mode = IN_GUEST_MODE should hold in both cases.
I think the fix is correct, but the code is using very awkward
synchronization through vmx->nested.pi_pending and slaps KVM_REQ_EVENT
on top.
If we checked vmx->nested.pi_pending after cli, we could get rid of
KVM_REQ_EVENT and if we want to support nested posted interrupts from
PCI devices, we should explicitly check for pi.on during VM entry.
I like this suggestion. If I understand correctly, these are the changes
I will do for v2 of this patch:
1. I Will change vmx_deliver_nested_posted_interrupt() to first set
pi_pending=true and then call kvm_vcpu_trigger_posted_interrupt(). No
need to set KVM_REQ_EVENT and no need to check
kvm_vcpu_trigger_posted_interrupt() ret-val.
2. I will create a kvm_x86_ops->complete_nested_posted_interrupt() that
will be called from vcpu_enter_guest() just after sync_pir_to_irr() is
called but outside the "if" for kvm_lapic_enabled().
3. I will remove call to vmx_complete_nested_posted_interrupt() from
vmx_check_nested_events(). That call-site had a bug anyway that I fixed
in another patch-series I posted here. This change will make that patch
(not the series) redundant. See redundant patch here:
https://marc.info/?l=kvm&m=150989090007281&w=2
I will prepare a v2 patch by this suggestion and send it.
Thanks!
-Liran
Kicks do not seem necessary as the only case where we need the kick is
the same case where kvm_vcpu_trigger_posted_interrupt() should just
deliver the interrupt.