Re: [PATCH v2 6/6] x86, apicv: Add Posted Interrupt supporting

Gleb Natapov <gleb@xxxxxxxxxx> · Tue, 27 Nov 2012 13:46:58 +0200

On Tue, Nov 27, 2012 at 11:10:20AM +0000, Zhang, Yang Z wrote:
> Gleb Natapov wrote on 2012-11-27:
> > On Tue, Nov 27, 2012 at 03:38:05AM +0000, Zhang, Yang Z wrote:
> >> Gleb Natapov wrote on 2012-11-26:
> >>> On Mon, Nov 26, 2012 at 12:29:54PM +0000, Zhang, Yang Z wrote:
> >>>> Gleb Natapov wrote on 2012-11-26:
> >>>>> On Mon, Nov 26, 2012 at 03:51:04AM +0000, Zhang, Yang Z wrote:
> >>>>>> Gleb Natapov wrote on 2012-11-25:
> >>>>>>> On Wed, Nov 21, 2012 at 04:09:39PM +0800, Yang Zhang wrote:
> >>>>>>>> Posted Interrupt allows vAPICV interrupts to inject into guest directly
> >>>>>>>> without any vmexit.
> >>>>>>>> 
> >>>>>>>> - When delivering a interrupt to guest, if target vcpu is running,
> >>>>>>>>   update Posted-interrupt requests bitmap and send a notification
> >>>>>>>>   event to the vcpu. Then the vcpu will handle this interrupt
> >>>>>>>>   automatically, without any software involvemnt.
> >>>>>>> Looks like you allocating one irq vector per vcpu per pcpu and then
> >>>>>>> migrate it or reallocate when vcpu move from one pcpu to another.
> >>>>>>> This is not scalable and migrating irq migration slows things down.
> >>>>>>> What's wrong with allocating one global vector for posted interrupt
> >>>>>>> during vmx initialization and use it for all vcpus?
> >>>>>> 
> >>>>>> Consider the following situation:
> >>>>>> If vcpu A is running when notification event which belong to vcpu B is
> >>> arrived,
> >>>>> since the vector match the vcpu A's notification vector, then this event
> >>>>> will be consumed by vcpu A(even it do nothing) and the interrupt cannot
> >>>>> be handled in time. The exact same situation is possible with your code.
> >>>>> vcpu B can be migrated from pcpu and vcpu A will take its place and will
> >>>>> be assigned the same vector as vcpu B. But I fail to see why is this a
> >>>> No, the on bit will be set to prevent notification event when vcpu B start
> >>> migration. And it only free the vector before it going to run in another pcpu.
> >>> There is a race. Sender check on bit, vcpu B migrate to another pcpu and
> >>> starts running there, vcpu A takes vpuc's B vector, sender send PI vcpu
> >>> A gets it.
> >> Yes, it do exist. But I think it should be ok even this happens.
> >> 
> > Then it is OK to use global PI vector. The race should be dealt with
> > anyway.
> Or using lock can deal with it too.
>  
Last thing we want is to hold lock while injecting interrupt. Also Vt-d
HW will not be able to use the lock.

> >>>> 
> >>>>> problem. vcpu A will ignore PI since pir will be empty and vcpu B should
> >>>>> detect new event during next vmentry.
> >>>> Yes, but the next vmentry may happen long time later and interrupt cannot
> > be
> >>> serviced until next vmentry. In current way, it will cause vmexit and
> >>> re-schedule the vcpu B. Vmentry will happen when scheduler will decide
> >>> that vcpu can run. There
> >> I don't know how scheduler can know the vcpu can run in this case, can
> >> you elaborate it? I thought it may have problems with global vector in
> >> some cases(maybe I am wrong, since I am not familiar with KVM
> >> scheduler): If target VCPU is in idle, and this notification event is
> >> consumed by other VCPU,
> > then how can scheduler know the vcpu is ready to run? Even there is a way for
> > scheduler to know, then when? Isn't it too late?
> >> If notification event arrived in hypervisor, then how the handler know which
> > VCPU the notification event belong to?
> > When vcpu is idle its thread sleeps inside host kernel (see
> > virt/kvm/kvm_main.c:kvm_vcpu_block()). To get it out of sleep
> > you should call kvm_vcpu_kick(), but only after changing vcpu
> > state to make it runnable. arch/x86/kvm/x86.c:kvm_arch_vcpu_runnable()
> > checks if vcpu is runnable. Notice that we call kvm_cpu_has_interrupt()
> > there which checks apic IRR, but not PIR, so it is not enough to set
> > bit in PIR and call kvm_vcpu_kick() to wake up vcpu.
> Sorry, I cannot understand it. As you said, we need to call kvm_vcpu_kick when the waiting event happened to wake up the blocked vcpu, but this event is consumed by other vcpu without any chance for us to kick it. Then how it will move out from blocked list to run queue.
> BTW, what I am talking is for the interrupt from VT-d case. For virtual interrupt, I think global vector is ok.
> Also, the second problem is also about the VT-d case.
> When cpu is running in VM root mode, and then an notification event arrives, since all VCPU use the same notification vector, we cannot distinguish which VCPU the notification event want to deliver to. And we cannot put the right VCPU to run queue.
>  
There is not VT-d code in proposed patches (is there spec available about
how VT-d integrates with PI?), so discussion is purely theoretical. VT-d
device will have to be reprogrammed to generate regular interrupt when
vcpu thread goes to sleep. This interrupt will be injected by VFIO via
irqfd just like assigned devices do now. When vcpu becomes runnable VT-d
device is reprogrammed back to generate PI.

> >> 
> >>> is no problem here. What you probably want to say is that vcpu may not be
> >>> aware of interrupt happening since it was migrated to different vcpu
> >>> just after PI IPI was sent and thus missed it. But than PIR interrupts
> >>> should be processed during vmentry on another pcpu:
> >>> 
> >>> Sender:                             Guest:
> >>> 
> >>> set pir
> >>> set on
> >>> if (vcpu in guest mode on pcpu1)
> >>>                                    vmexit on pcpu1
> >>>                                    vmentry on pcpu2
> >>>                                    process pir, deliver interrupt
> >>> 	send PI IPI to pcpu1
> >> 
> >>>> 
> >>>>>> 
> >>>>>>> 
> >>>>>>>> +	if (!cfg) {
> >>>>>>>> +		free_irq_at(irq, NULL);
> >>>>>>>> +		return 0;
> >>>>>>>> +	}
> >>>>>>>> +
> >>>>>>>> +	raw_spin_lock_irqsave(&vector_lock, flags);
> >>>>>>>> +	if (!__assign_irq_vector(irq, cfg, mask))
> >>>>>>>> +		ret = irq;
> >>>>>>>> +	raw_spin_unlock_irqrestore(&vector_lock, flags);
> >>>>>>>> +
> >>>>>>>> +	if (ret) {
> >>>>>>>> +		irq_set_chip_data(irq, cfg);
> >>>>>>>> +		irq_clear_status_flags(irq, IRQ_NOREQUEST);
> >>>>>>>> +	} else {
> >>>>>>>> +		free_irq_at(irq, cfg);
> >>>>>>>> +	}
> >>>>>>>> +	return ret;
> >>>>>>>> +}
> >>>>>>> 
> >>>>>>> This function is mostly cut&paste of create_irq_nr().
> >>>>>> 
> >>>>>> Yes, this function allow to allocate vector from specified cpu.
> >>>>>> 
> >>>>> Does not justify code duplication.
> >>>> ok. will change it in next version.
> >>>> 
> >>> Please use single global vector in the next version.
> >>> 
> >>>>>>>> 
> >>>>>>>>  	if (kvm_x86_ops->has_virtual_interrupt_delivery(vcpu))
> >>>>>>>>  		apic->vid_enabled = true;
> >>>>>>>> +
> >>>>>>>> +	if (kvm_x86_ops->has_posted_interrupt(vcpu))
> >>>>>>>> +		apic->pi_enabled = true;
> >>>>>>>> +
> >>>>>>> This is global state, no need per apic variable.
> >>>> Even all vcpus use the same setting, but according to SDM, apicv really is a
> > per
> >>> apic variable.
> >>> It is not per vapic in our implementation and this is what is
> >>> important here.
> >>> 
> >>>> Anyway, if you think we should not put it here, where is the best place?
> >>> It is not needed, just use has_posted_interrupt(vcpu) instead.
> >> ok
> >> 
> >>>> 
> >>>>>>>> @@ -1555,6 +1611,11 @@ static void vmx_vcpu_load(struct kvm_vcpu
> >>>>> *vcpu,
> >>>>>>> int cpu)
> >>>>>>>>  		struct desc_ptr *gdt = &__get_cpu_var(host_gdt);
> >>>>>>>>  		unsigned long sysenter_esp;
> >>>>>>>> +		if (enable_apicv_pi && to_vmx(vcpu)->pi)
> >>>>>>>> +			pi_set_on(to_vmx(vcpu)->pi);
> >>>>>>>> +
> >>>>>>> Why?
> >>>>>> 
> >>>>>> Here means the vcpu start migration. So we should prevent the
> >>>>>> notification event until migration end.
> >>>>>> 
> >>>>> You check for IN_GUEST_MODE while sending notification. Why is this not
> >>>> For interrupt from emulated device, it enough. But VT-d device doesn't
> > know
> >>> the vcpu is migrating, so set the on bit to prevent the notification event when
> >>> target vcpu is migrating.
> >>> Why should VT-d device care about that? It sets bits in pir and sends
> >>> IPI. If vcpu is running it process pir immediately, if not it will do it
> >>> during next vmentry.
> >> We already know the vcpu is not running(it will run soon), we can set this bit to
> > prevent the unnecessary IPI. We have IN_GUEST_MODE for that. And this is
> > the wrong place to indicate that vcpu is not running anyway. vcpu is not
> > running immediately after vmexit.
> But the VT-d chipset doesn't know. We need to set this bit to tell it.
>  
You are trying to optimize here. Let it generate interrupt that will be
ignored and optimize later. Setting bit here does not make sense for
optimization you are trying to do because call to vcpu_load() means that
vcpu will likely enter guest mode in the near feature, but it was not
running before that point, so setting it here is kinda later.

> >> 
> >>>> 
> >>>>> enough? Also why vmx_vcpu_load() call means that vcpu start migration?
> >>>> I think the follow check can ensure the vcpu is in migration, am I wrong?
> >>>> if (vmx->loaded_vmcs->cpu != cpu)
> >>> This code checks that this vcpu ran on that pcpu last time.
> >> Yes, migration starts more earlier than here. But I think it should be
> >> ok to set ON bit here. Do you have any better idea?
> >> 
> > If you want to prevent assigned device from sending IPI to non running
> > vcpu you should set the bit immediately after vmexit. For emulated
> > devices vcpu->mode should be used.
> No, if the reason of vmexit is waiting for interrupt from assigned device , then it will never have the chance to get this interrupt.
>  
Exactly what will happen with your code. Consider vcpu that executed HLT
instruction. If, for some reason, it does exit to userspace while
halted, on the next ioctl(VM_RUN) "on" will be set by vmx_vcpu_load()
vcpu thread will return to kvm_vcpu_block() and VT-d interrupt will
never be triggered.

As I said above solution may be to reprogram VT-d device to regular interrupt
while vcpu is halted.

> >>>> {
> >>>> 	if (enable_apicv_pi && to_vmx(vcpu)->pi)
> >>>> 	pi_set_on(to_vmx(vcpu)->pi);
> >>>> }
> >>>> 
> >>>>>>>> +		kvm_make_request(KVM_REQ_POSTED_INTR, vcpu);
> >>>>>>>> +
> >>>>>>>>  		kvm_make_request(KVM_REQ_TLB_FLUSH, vcpu);
> >>>>>>>>  	local_irq_disable();
> >>>>>>>>  		list_add(&vmx->loaded_vmcs->loaded_vmcss_on_cpu_link, @@
> >>>>>>>>  -1582,6 +1643,8 @@ static void vmx_vcpu_put(struct kvm_vcpu
> >>>>>>>>  *vcpu) 	vcpu->cpu = -1; 		kvm_cpu_vmxoff(); 	}
> >>>>>>>> +	if (enable_apicv_pi && to_vmx(vcpu)->pi)
> >>>>>>>> +		pi_set_on(to_vmx(vcpu)->pi);
> >>>>>>> Why?
> >>>>>> 
> >>>>>> When vcpu schedule out, no need to send notification event to it, just set
> >>> the
> >>>>> PIR and wakeup it is enough.
> >>>>> Same as above. When vcpu is scheduled out it will no be in
> > IN_GUEST_MODE
> >>>> Right.
> >>>> 
> >>>>> mode. Also in this case we probably should set bit directly in IRR and leave
> >>>>> PIR alone.
> >>>>> From the view of hypervisor, IRR and PIR are same. For each vmentry, if PI
> > is
> >>> enabled, the IRR equal to (IRR | PIR). So there is no difference to
> >>> set IRR or PIR if target vcpu is not running. But there is a
> >>> difference for KVM code. For instance kvm_arch_vcpu_runnable() checks
> >>> for interrupts in IRR, but not PIR. Migration code does the same.
> >> Right. With PI, we need check the PIR too.
> >> 
> >>>> 
> >>>>> 
> >>>>>> 
> >>>>>>>>  }
> >>>>>>>>  
> >>>>>>>>  static void vmx_fpu_activate(struct kvm_vcpu *vcpu)
> >>>>>>>> @@ -2451,12 +2514,6 @@ static __init int setup_vmcs_config(struct
> >>>>>>> vmcs_config *vmcs_conf)
> >>>>>>>>  	u32 _vmexit_control = 0;
> >>>>>>>>  	u32 _vmentry_control = 0;
> >>>>>>>> -	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING; -	opt =
> >>>>>>>> PIN_BASED_VIRTUAL_NMIS; -	if (adjust_vmx_controls(min, opt,
> >>>>>>>> MSR_IA32_VMX_PINBASED_CTLS, -				&_pin_based_exec_control) < 0)
> >>>>>>>> -		return -EIO; -
> >>>>>>>>  	min = CPU_BASED_HLT_EXITING |
> >>>>>>>>  #ifdef CONFIG_X86_64
> >>>>>>>>  	      CPU_BASED_CR8_LOAD_EXITING |
> >>>>>>>> @@ -2531,6 +2588,17 @@ static __init int setup_vmcs_config(struct
> >>>>>>> vmcs_config *vmcs_conf)
> >>>>>>>>  				&_vmexit_control) < 0)
> >>>>>>>>  		return -EIO;
> >>>>>>>> +	min = PIN_BASED_EXT_INTR_MASK | PIN_BASED_NMI_EXITING; +	opt =
> >>>>>>>> PIN_BASED_VIRTUAL_NMIS | PIN_BASED_POSTED_INTR; +	if
> >>>>>>>> (adjust_vmx_controls(min, opt, MSR_IA32_VMX_PINBASED_CTLS,
> >>>>>>>> +				&_pin_based_exec_control) < 0) +		return -EIO; + +	if
> >>>>>>>> (!(_cpu_based_2nd_exec_control &
> >>>>>>>> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY) || +		!(_vmexit_control
> >>>>>>>> & VM_EXIT_ACK_INTR_ON_EXIT)) +		_pin_based_exec_control &=
> >>>>>>>> ~PIN_BASED_POSTED_INTR; +
> >>>>>>>>  	min = 0; 	opt = VM_ENTRY_LOAD_IA32_PAT; 	if
> >>>>>>>>  (adjust_vmx_controls(min, opt, MSR_IA32_VMX_ENTRY_CTLS, @@
> >>>>>>>>  -2715,6 +2783,9 @@ static __init int hardware_setup(void) 	if
> >>>>>>>>  (!cpu_has_vmx_virtual_intr_delivery()) 		enable_apicv_vid =
> > 0;
> >>>>>>>> +	if (!cpu_has_vmx_posted_intr() || !x2apic_enabled())
> >>>>>>> In nested guest x2apic may be enabled without irq remapping. Check for
> >>>>>>> irq remapping here.
> >>>>>> 
> >>>>>> There are no posted interrupt available in nested case. We don't need
> >>>>>> to check IR here.
> >>>>>> 
> >>>>> One day emulation will be added. If pre-request for PI is IR check
> >>>>> for IR.
> >>>> 
> >>>> 
> >>>>> BTW why IR is needed for PI. To deliver assigned devices interrupts
> >>>>> directly into a guest sure, but why is it required for delivering
> >>>>> interrupts from emulated devices or IPIs?
> >>>> Posted Interrupt support is Xeon only and these platforms will have x2APIC.
> > So,
> >>> Linux will enable x2APIC on these platforms. So we only want to enable
> >>> PI when x2apic is enabled and IR is required for x2apic. The fact that
> >>> x2APIC is available on all platform that support PI is irrelevant. If
> >>> one is not strictly required by the other by architecture do not
> >>> couple them.
> >> Right. We only want to simply the implementation of enable "ack intr on exit".
> > If IR enabled, then don't need to check the trig mode(all interrupts are edge)
> > when using self IPI to regenerate the interrupt.
> > With Avi's suggestion self IPI is not needed. Drop this dependency if it
> > is not architectural.
> Ok, then we need to read the TMR and only send self IPI for edge interrupt.  
>  
Sounds OK. It can be optimized later by checking with ioapic code. The
configuration rarely changes, so it is possible to avoid reading TMR
register on each interrupt.

> >> 
> >>>> 
> >>>>>>> 
> >>>>>>>> +		enable_apicv_pi = 0;
> >>>>>>>> +
> >>>>>>>>  	if (nested) 		nested_vmx_setup_ctls_msrs(); @@ -3881,6 +3952,93
> >>>>>>>>  @@ static void ept_set_mmio_spte_mask(void)
> >>>>>>>>  	kvm_mmu_set_mmio_spte_mask(0xffull << 49 | 0x6ull); }
> >>>>>>>> +irqreturn_t pi_handler(int irq, void *data)
> >>>>>>>> +{
> >>>>>>>> +	struct vcpu_vmx *vmx = data;
> >>>>>>>> +
> >>>>>>>> +	kvm_make_request(KVM_REQ_EVENT, &vmx->vcpu);
> >>>>>>>> +	kvm_vcpu_kick(&vmx->vcpu);
> >>>>>>>> +
> >>>>>>>> +	return IRQ_HANDLED;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static int vmx_has_posted_interrupt(struct kvm_vcpu *vcpu)
> >>>>>>>> +{
> >>>>>>>> +	return irqchip_in_kernel(vcpu->kvm) && enable_apicv_pi;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static void vmx_pi_migrate(struct kvm_vcpu *vcpu)
> >>>>>>>> +{
> >>>>>>>> +	int ret = 0;
> >>>>>>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >>>>>>>> +
> >>>>>>>> +	if (!enable_apicv_pi)
> >>>>>>>> +		return ;
> >>>>>>>> +
> >>>>>>>> +	preempt_disable();
> >>>>>>>> +	local_irq_disable();
> >>>>>>>> +	if (!vmx->irq) {
> >>>>>>>> +		ret = arch_pi_alloc_irq(vmx);
> >>>>>>>> +		if (ret < 0) {
> >>>>>>>> +			vmx->irq = -1;
> >>>>>>>> +			goto out;
> >>>>>>>> +		}
> >>>>>>>> +		vmx->irq = ret;
> >>>>>>>> +
> >>>>>>>> +		ret = request_irq(vmx->irq, pi_handler, IRQF_NO_THREAD,
> >>>>>>>> +					"Posted Interrupt", vmx);
> >>>>>>>> +		if (ret) {
> >>>>>>>> +			vmx->irq = -1;
> >>>>>>>> +			goto out;
> >>>>>>>> +		}
> >>>>>>>> +
> >>>>>>>> +		ret = arch_pi_get_vector(vmx->irq);
> >>>>>>>> +	} else
> >>>>>>>> +		ret = arch_pi_migrate(vmx->irq, smp_processor_id());
> >>>>>>>> +
> >>>>>>>> +	if (ret < 0) {
> >>>>>>>> +		vmx->irq = -1;
> >>>>>>>> +		goto out;
> >>>>>>>> +	} else {
> >>>>>>>> +		vmx->vector = ret;
> >>>>>>>> +		vmcs_write16(POSTED_INTR_NV, vmx->vector);
> >>>>>>>> +		pi_clear_on(vmx->pi);
> >>>>>>>> +	}
> >>>>>>>> +out:
> >>>>>>>> +	local_irq_enable();
> >>>>>>>> +	preempt_enable();
> >>>>>>>> +	return ;
> >>>>>>>> +}
> >>>>>>>> +
> >>>>>>>> +static int vmx_send_nv(struct kvm_vcpu *vcpu,
> >>>>>>>> +		int vector)
> >>>>>>>> +{
> >>>>>>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
> >>>>>>>> +
> >>>>>>>> +	if (unlikely(vmx->irq == -1))
> >>>>>>>> +		return 0;
> >>>>>>>> +
> >>>>>>>> +	if (vcpu->cpu == smp_processor_id()) {
> >>>>>>>> +		pi_set_on(vmx->pi);
> >>>>>>> Why? You clear this bit anyway in vmx_update_irq() during guest entry.
> >>>>>> Here means the target vcpu already in vmx non-root mode. Then it will
> >>>>> consume the interrupt on next vm entry and we don't need to send the
> >>>>> notification event from other cpu, just update PIR is enough.
> >>>>> I understand why you avoid sending PI IPI here, but you do not update
> >>>>> pir in this case either. You only set "on" bit here and set vector directly
> >>>>> in IRR in __apic_accept_irq() since vmx_send_nv() returns 0 in this case.
> >>>>> Interrupt is delivered from IRR on the next entry.
> >>>> As I mentioned, IRR is basically same as PIR.
> >>>> 
> >>> That does not explain why are you setting "on" without setting bit in pir.
> >> Right. Just set the PIR and return 1 is enough.
> >> 
> >>>>>> 
> >>>>>>>> +		return 0; +	} + +	pi_set_pir(vector, vmx->pi); +	if
> >>>>>>>> (!pi_test_and_set_on(vmx->pi) && (vcpu->mode == IN_GUEST_MODE)) {
> >>>>>>>> +		apic->send_IPI_mask(get_cpu_mask(vcpu->cpu), vmx->vector);
> >>>>>>>> +		return 1; +	} +	return 0; +} + +static void free_pi(struct
> >>>>>>>> vcpu_vmx *vmx) +{ +	if (enable_apicv_pi) { + 	kfree(vmx->pi);
> >>>>>>>> +		arch_pi_free_irq(vmx->irq, vmx); +	} +} +
> >>>>>>>>  /*
> >>>>>>>>   * Sets up the vmcs for emulated real mode.
> >>>>>>>>   */
> >>>>>>>> @@ -3890,6 +4048,7 @@ static int vmx_vcpu_setup(struct vcpu_vmx
> >>> *vmx)
> >>>>>>>>  	unsigned long a;
> >>>>>>>>  #endif
> >>>>>>>>  	int i;
> >>>>>>>> +	u32 pin_based_exec_ctrl = vmcs_config.pin_based_exec_ctrl;
> >>>>>>>> 
> >>>>>>>>  	/* I/O */ 	vmcs_write64(IO_BITMAP_A, __pa(vmx_io_bitmap_a)); @@
> >>>>>>>>  -3901,8 +4060,10 @@ static int vmx_vcpu_setup(struct vcpu_vmx
> >>>>>>>>  *vmx) 	vmcs_write64(VMCS_LINK_POINTER, -1ull); /* 22.3.1.5 */
> >>>>>>>>  
> >>>>>>>>  	/* Control */
> >>>>>>>> -	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL,
> >>>>>>>> -		vmcs_config.pin_based_exec_ctrl); +	if (!enable_apicv_pi)
> >>>>>>>> +		pin_based_exec_ctrl &= ~PIN_BASED_POSTED_INTR; +
> >>>>>>>> +	vmcs_write32(PIN_BASED_VM_EXEC_CONTROL, pin_based_exec_ctrl);
> >>>>>>>> 
> >>>>>>>>  	vmcs_write32(CPU_BASED_VM_EXEC_CONTROL,
> >>>>>>> vmx_exec_control(vmx));
> >>>>>>>> 
> >>>>>>>> @@ -3920,6 +4081,12 @@ static int vmx_vcpu_setup(struct vcpu_vmx
> >>>>> *vmx)
> >>>>>>>>  		vmcs_write16(GUEST_INTR_STATUS, 0);
> >>>>>>>>  	}
> >>>>>>>> +	if (enable_apicv_pi) { +		vmx->pi = kmalloc(sizeof(struct
> >>>>>>>> pi_desc), +				GFP_KERNEL | __GFP_ZERO);
> >>>>>>>> +		vmcs_write64(POSTED_INTR_DESC_ADDR, __pa((vmx->pi))); +	} +
> >>>>>>>>  	if (ple_gap) { 		vmcs_write32(PLE_GAP, ple_gap);
> >>>>>>>>  		vmcs_write32(PLE_WINDOW, ple_window); @@ -6161,6 +6328,11 @@
> >>>>>>>>  static void vmx_update_irq(struct kvm_vcpu *vcpu) 	if
> >>>>>>>>  (!enable_apicv_vid) 		return ;
> >>>>>>>> +	if (enable_apicv_pi) {
> >>>>>>>> +		kvm_apic_update_irr(vcpu, (unsigned int *)vmx->pi->pir);
> >>>>>>>> +		pi_clear_on(vmx->pi);
> >>>>>>> Why do you do that? Isn't VMX process posted interrupts on vmentry if
> >>>>>>> "on" bit is set?
> >>>>> Can you answer this question?
> >>>> No, vmentry do nothing for PI. Posted interrupt only happens when an
> >>> unmasked external interrupt arrived and the target vcpu is running.
> >>> Beyond that, cpu follow the old way.
> >>>> 
> >>> Now that totally contradicts what you wrote above! (unless I
> >>> misunderstood you, in which case clarify please)
> >>> 
> >>>   From the view of hypervisor, IRR and PIR are same. For each vmentry, if
> >>>   PI is enabled, the IRR equal to (IRR | PIR). So there is no difference
> >>>   to set IRR or PIR if target vcpu is not running.
> >>> --
> >>> 			Gleb.
> >> Sorry, maybe I mislead you. VMentry have nothing to do PI.
> >> What I mean" IRR and PIR are same" is because this patch will copy PIR to IRR
> > before each vmentry. So I think this two should be same in some levels. But
> > according your comments, it may wrong.
> >> 
> > OK. Thanks for clarification.
> > 
> > --
> > 			Gleb.
> > --
> > To unsubscribe from this list: send the line "unsubscribe kvm" in
> > the body of a message to majordomo@xxxxxxxxxxxxxxx
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Best regards,
> Yang
> 

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html