On 12/05/17 10:16, Liran Alon wrote: > When L1 wants to send a posted-interrupt to another L1 CPU which runs > L2, it does the following operations: > 1. Sets the relevant bit in vmx->nested.pi_desc PIR. > 2. Set ON bit in vmx->nested.pi_desc->control field. > 3. Sends an IPI to dest L1 CPU by writing the the posted-interrupt > notification-vector into the LAPIC ICR. > > Step (3) will exit to L0 on APIC_WRITE which will eventually reach > vmx_deliver_nested_posted_interrupt(). If dest L0 CPU is in guest, > then a physical IPI of notification-vector will be sent which will > trigger evaluation of posted-interrupts and clear > vmx->nested.pi_desc->control ON bit. > Otherwise (or if dest L0 CPU exited from guest just before sending the > physical IPI), the nested-posted-interrupts will be evaluated on next > vmentry by vmx_complete_nested_posted_interrupt(). > > In order for vmx_complete_nested_posted_interrupt() to know if it > should do any work, the flag vmx->nested.pi_pending was used which > was set by vmx_deliver_nested_posted_interrupt(). > > However, this seems unnecessary because if posted-interrupts was not > processed yet, vmx->nested.pi_desc->control ON bit should still be > set. Therefore, it should suffice to use it in order to know if work > should be done. > > Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx> > Reviewed-by: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx> > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx> > Signed-off-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@xxxxxxxxxx> > --- > arch/x86/kvm/vmx.c | 6 +----- > 1 file changed, 1 insertion(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 714a0673ec3c..f5074ec5701b 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -453,7 +453,6 @@ struct nested_vmx { > struct page *virtual_apic_page; > struct page *pi_desc_page; > struct pi_desc *pi_desc; > - bool pi_pending; > u16 posted_intr_nv; > > unsigned long *msr_bitmap; > @@ -5050,10 +5049,9 @@ static void vmx_complete_nested_posted_interrupt(struct kvm_vcpu *vcpu) > void *vapic_page; > u16 status; > > - if (!vmx->nested.pi_desc || !vmx->nested.pi_pending) > + if (!vmx->nested.pi_desc) > return; > > - vmx->nested.pi_pending = false; > if (!pi_test_and_clear_on(vmx->nested.pi_desc)) > return; > > @@ -5126,7 +5124,6 @@ static int vmx_deliver_nested_posted_interrupt(struct kvm_vcpu *vcpu, > * If a posted intr is not recognized by hardware, > * we will accomplish it in the next vmentry. > */ > - vmx->nested.pi_pending = true; > kvm_make_request(KVM_REQ_EVENT, vcpu); > return 0; > } > @@ -10488,7 +10485,6 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, struct vmcs12 *vmcs12, > /* Posted interrupts setting is only taken from vmcs12. */ > if (nested_cpu_has_posted_intr(vmcs12)) { > vmx->nested.posted_intr_nv = vmcs12->posted_intr_nv; > - vmx->nested.pi_pending = false; > vmcs_write16(POSTED_INTR_NV, POSTED_INTR_NESTED_VECTOR); > } else { > exec_control &= ~PIN_BASED_POSTED_INTR; > It turns out this patch is incorrect and should be removed from series. I will remove it from next version of this patch series. Just didn't want you to queue it by mistake. :) I figured that pi_pending is used as a flag to indicate if L1 has sent a posted-interrupt notification-vector such that KVM won't process the nested-posted-interrupts on vmentry in case L1 has already set the ON bit in vmx->nested.pi_desc->control field but not sent the notification-vector yet. That will break vCPU semantics. (This is in contrast to how vmx_sync_pir_to_irr() does ignore whether a posted-interrupt notification-vector was sent or not and just consume posted-interrupts on each vmentry if the vmx->pi_desc.control ON bit is set. That is fine because in this case L0 KVM defines how it wishes to handle posted-interrupts). Regards, -Liran