RE: [PATCH v3 3/4] x86, apicv: add virtual interrupt delivery support

"Zhang, Yang Z" <yang.z.zhang@xxxxxxxxx> · Wed, 5 Dec 2012 06:02:59 +0000



Gleb Natapov wrote on 2012-12-05:
> On Wed, Dec 05, 2012 at 01:55:17AM +0000, Zhang, Yang Z wrote:
>> Gleb Natapov wrote on 2012-12-04:
>>> On Tue, Dec 04, 2012 at 06:39:50AM +0000, Zhang, Yang Z wrote:
>>>> Gleb Natapov wrote on 2012-12-03:
>>>>> On Mon, Dec 03, 2012 at 03:01:03PM +0800, Yang Zhang wrote:
>>>>>> Virtual interrupt delivery avoids KVM to inject vAPIC interrupts
>>>>>> manually, which is fully taken care of by the hardware. This needs
>>>>>> some special awareness into existing interrupr injection path:
>>>>>> 
>>>>>> - for pending interrupt, instead of direct injection, we may need
>>>>>>   update architecture specific indicators before resuming to guest. -
>>>>>>   A pending interrupt, which is masked by ISR, should be also
>>>>>>   considered in above update action, since hardware will decide when
>>>>>>   to inject it at right time. Current has_interrupt and get_interrupt
>>>>>>   only returns a valid vector from injection p.o.v.
>>>>> Most of my previous comments still apply.
>>>>> 
>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>> +		int trig_mode, int always_set)
>>>>>> +{
>>>>>> +	if (kvm_x86_ops->set_eoi_exitmap)
>>>>>> +		kvm_x86_ops->set_eoi_exitmap(vcpu, vector,
>>>>>> +					trig_mode, always_set);
>>>>>> +}
>>>>>> +
>>>>>>  /*
>>>>>>   * Add a pending IRQ into lapic.
>>>>>>   * Return 1 if successfully added and 0 if discarded.
>>>>>> @@ -661,6 +669,7 @@ static int __apic_accept_irq(struct kvm_lapic
> *apic,
>>> int
>>>>> delivery_mode,
>>>>>>  		if (unlikely(!apic_enabled(apic)))
>>>>>>  			break;
>>>>>> +		kvm_set_eoi_exitmap(vcpu, vector, trig_mode, 0);
>>>>> As I said in the last review rebuild the bitmap when ioapic or irq
>>>>> notifier configuration changes, user request bit to notify vcpus to
>>>>> reload the bitmap.
>>>> It is too complicated. When program ioapic entry, we cannot get the target
> vcpu
>>> easily. We need to read destination format register and logical destination
>>> register to find out target vcpu if using logical mode. Also, we must trap every
>>> modification to the two registers to update eoi bitmap.
>>> No need to check target vcpu. Enable exit on all vcpus for the vector
>> This is wrong. As we known, modern OS uses per VCPU vector. We cannot
> ensure all vectors have same trigger mode. And what's worse, the vector in
> another vcpu is used to handle high frequency interrupts(like 10G NIC), then it
> will hurt performance.
>> 
> I never saw OSes reuse vector used by ioapic, as far as I see this
Could you point out which code does this check in Linux kernel? I don't see any special checks when Linux kernel allocates a vector.

> is not how Linux code works. Furthermore it will not work with KVM
> currently since apic eoi redirected to ioapic based on vector alone,
> not vector/vcpu pair and as far as I am aware this is how real HW works.
yes, real HW works in this way. But why it is helpful in this case?

>>> programmed into ioapic. Which two registers? All accesses to ioapic are
>>> trapped and reconfiguration is rare.
>> In logical mode, the destination VCPU is depend on each CPU's destination
> format register and logical destination register. So we must also trap the two
> registers.
>> And if it uses lowest priority delivery mode, the PPR need to be trapped too.
> Since PPR will change on each interrupt injection, the cost should be higher than
> current approach.
> No need for all of that if bitmask it global.
No, the bitmask is per VCPU. Also, why it will work if bitmask is global?

>> 
>>>> For irq notifier, only PIT is special which is edge trigger but need an
>>>> EOI notifier. So, just treat it specially. And TMR can cover others.
>>>> 
>>> We shouldn't assume that. If another notifier will be added it will be
>>> easy to forget to update apicv code to exclude another vector too.
>> At this point, guest is not running(in device initializing), we cannot not know the
> vector. As you mentioned, the best point is when guest program ioapic entry. But
> it also is impossible to get the vector(see above).
>> I can give some comments on the function to remind the caller to update
>> eoi bitmap when the interrupt is edge and they still want to get EOI
>> vmexit.
>> 
>>>>> 
>>>>>>  		if (trig_mode) {
>>>>>>  			apic_debug("level trig mode for vector %d", vector);
>>>>>>  			apic_set_vector(vector, apic->regs + APIC_TMR);
>>>>>> @@ -740,6 +749,19 @@ int kvm_apic_compare_prio(struct kvm_vcpu
>>> *vcpu1,
>>>>> struct kvm_vcpu *vcpu2)
>>>>>>  	return vcpu1->arch.apic_arb_prio - vcpu2->arch.apic_arb_prio;
>>>>>>  }
>>>>>> +static void kvm_ioapic_send_eoi(struct kvm_lapic *apic, int
>>>>>> vector) +{ +	if (!(kvm_apic_get_reg(apic, APIC_SPIV) &
>>>>>> APIC_SPIV_DIRECTED_EOI) && +
>>>>>> kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { +		int
>>>>>> trigger_mode; +		if (apic_test_vector(vector, apic->regs +
>>>>>> APIC_TMR)) +			trigger_mode = IOAPIC_LEVEL_TRIG; +		else +
>>>>>> 		trigger_mode = IOAPIC_EDGE_TRIG;
>>>>>> +		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>>>>>> +	} +} +
>>>>>>  static int apic_set_eoi(struct kvm_lapic *apic) { 	int vector =
>>>>>>  apic_find_highest_isr(apic); @@ -756,19 +778,24 @@ static int
>>>>>>  apic_set_eoi(struct kvm_lapic *apic) 	apic_clear_isr(vector, apic);
>>>>>>  	apic_update_ppr(apic);
>>>>>> -	if (!(kvm_apic_get_reg(apic, APIC_SPIV) & APIC_SPIV_DIRECTED_EOI)
>>>>>> && -	    kvm_ioapic_handles_vector(apic->vcpu->kvm, vector)) { -
>>>>>> 		int trigger_mode; -		if (apic_test_vector(vector, apic->regs +
>>>>>> APIC_TMR)) -			trigger_mode = IOAPIC_LEVEL_TRIG; -		else -
>>>>>> 		trigger_mode = IOAPIC_EDGE_TRIG;
>>>>>> -		kvm_ioapic_update_eoi(apic->vcpu->kvm, vector, trigger_mode);
>>>>>> -	} +	kvm_ioapic_send_eoi(apic, vector);
>>>>>>  	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>>>>  	return vector;
>>>>>>  }
>>>>>> +/*
>>>>>> + * this interface assumes a trap-like exit, which has already finished
>>>>>> + * desired side effect including vISR and vPPR update.
>>>>>> + */
>>>>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector)
>>>>>> +{
>>>>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>>>>> +
>>>>> trace_kvm_eoi()
>>>> Ok.
>>>> 
>>>>>> +	kvm_ioapic_send_eoi(apic, vector);
>>>>>> +	kvm_make_request(KVM_REQ_EVENT, apic->vcpu);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(kvm_apic_set_eoi_accelerated);
>>>>>> +
>>>>>>  static void apic_send_ipi(struct kvm_lapic *apic) { 	u32 icr_low =
>>>>>>  kvm_apic_get_reg(apic, APIC_ICR); @@ -1533,6 +1560,17 @@ int
>>>>>>  kvm_apic_has_interrupt(struct kvm_vcpu *vcpu) 	return highest_irr; }
>>>>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>>>>>> +
>>>>>> +	if (!apic || !apic_enabled(apic))
>>>>> Use kvm_vcpu_has_lapic() instead of checking arch.apic directly.
>>>> Ok.
>>>> 
>>>>> 
>>>>>> +		return -1;
>>>>>> +
>>>>>> +	return apic_find_highest_irr(apic);
>>>>>> +}
>>>>>> +EXPORT_SYMBOL_GPL(kvm_apic_get_highest_irr);
>>>>>> +
>>>>>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu)
>>>>>>  {
>>>>>>  	u32 lvt0 = kvm_apic_get_reg(vcpu->arch.apic, APIC_LVT0);
>>>>>> diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
>>>>>> index c42f111..749661a 100644
>>>>>> --- a/arch/x86/kvm/lapic.h
>>>>>> +++ b/arch/x86/kvm/lapic.h
>>>>>> @@ -39,6 +39,9 @@ void kvm_free_lapic(struct kvm_vcpu *vcpu);
>>>>>>  int kvm_apic_has_interrupt(struct kvm_vcpu *vcpu);
>>>>>>  int kvm_apic_accept_pic_intr(struct kvm_vcpu *vcpu);
>>>>>>  int kvm_get_apic_interrupt(struct kvm_vcpu *vcpu);
>>>>>> +int kvm_cpu_has_extint(struct kvm_vcpu *v);
>>>>>> +int kvm_cpu_get_extint(struct kvm_vcpu *v);
>>>>>> +int kvm_apic_get_highest_irr(struct kvm_vcpu *vcpu);
>>>>>>  void kvm_lapic_reset(struct kvm_vcpu *vcpu); u64
>>>>>>  kvm_lapic_get_cr8(struct kvm_vcpu *vcpu); void
>>>>>>  kvm_lapic_set_tpr(struct kvm_vcpu *vcpu, unsigned long cr8); @@
>>>>>>  -50,6 +53,8 @@ void kvm_apic_set_version(struct kvm_vcpu *vcpu);
>>>>>>  int kvm_apic_match_physical_addr(struct kvm_lapic *apic, u16
>>>>>>  dest); int kvm_apic_match_logical_addr(struct kvm_lapic *apic, u8
>>>>>>  mda); int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct
>>>>>>  kvm_lapic_irq *irq);
>>>>>> +void kvm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>> +		int need_eoi, int global);
>>>>>>  int kvm_apic_local_deliver(struct kvm_lapic *apic, int lvt_type);
>>>>>>  
>>>>>>  bool kvm_irq_delivery_to_apic_fast(struct kvm *kvm, struct kvm_lapic
>>> *src,
>>>>>> @@ -65,6 +70,7 @@ u64 kvm_get_lapic_tscdeadline_msr(struct
> kvm_vcpu
>>>>> *vcpu);
>>>>>>  void kvm_set_lapic_tscdeadline_msr(struct kvm_vcpu *vcpu, u64
> data);
>>>>>> 
>>>>>>  int kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset);
>>>>>> +void kvm_apic_set_eoi_accelerated(struct kvm_vcpu *vcpu, int vector);
>>>>>> 
>>>>>>  void kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t
>>>>>>  vapic_addr); void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu);
>>>>>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>>>> index dcb7952..8f0903b 100644
>>>>>> --- a/arch/x86/kvm/svm.c
>>>>>> +++ b/arch/x86/kvm/svm.c
>>>>>> @@ -3573,6 +3573,22 @@ static void update_cr8_intercept(struct
>>> kvm_vcpu
>>>>> *vcpu, int tpr, int irr)
>>>>>>  		set_cr_intercept(svm, INTERCEPT_CR8_WRITE);
>>>>>>  }
>>>>>> +static int svm_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>> +static void svm_update_irq(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	return ;
>>>>>> +}
>>>>>> +
>>>>>> +static void svm_set_eoi_exitmap(struct kvm_vcpu *vcpu, int vector,
>>>>>> +				int trig_mode, int always_set)
>>>>>> +{
>>>>>> +	return ;
>>>>>> +}
>>>>>> +
>>>>>>  static int svm_nmi_allowed(struct kvm_vcpu *vcpu) { 	struct
>>>>>>  vcpu_svm *svm = to_svm(vcpu); @@ -4292,6 +4308,9 @@ static struct
>>>>>>  kvm_x86_ops svm_x86_ops = { 	.enable_nmi_window =
>>>>>>  enable_nmi_window, 	.enable_irq_window = enable_irq_window,
>>>>>>  	.update_cr8_intercept = update_cr8_intercept,
>>>>>> +	.has_virtual_interrupt_delivery = svm_has_virtual_interrupt_delivery,
>>>>>> +	.update_irq = svm_update_irq;
>>>>>> +	.set_eoi_exitmap = svm_set_eoi_exitmap;
>>>>>> 
>>>>>>  	.set_tss_addr = svm_set_tss_addr,
>>>>>>  	.get_tdp_level = get_npt_level,
>>>>>> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
>>>>>> index 6a5f651..909ce90 100644
>>>>>> --- a/arch/x86/kvm/vmx.c
>>>>>> +++ b/arch/x86/kvm/vmx.c
>>>>>> @@ -86,6 +86,9 @@ module_param(fasteoi, bool, S_IRUGO);
>>>>>>  static bool __read_mostly enable_apicv_reg;
>>>>>>  module_param(enable_apicv_reg, bool, S_IRUGO);
>>>>>> +static bool __read_mostly enable_apicv_vid;
>>>>>> +module_param(enable_apicv_vid, bool, S_IRUGO);
>>>>>> +
>>>>>>  /*
>>>>>>   * If nested=1, nested virtualization is supported, i.e., guests may use
>>>>>>   * VMX and be a hypervisor for its own guests. If nested=0, guests may
>>> not
>>>>>> @@ -432,6 +435,9 @@ struct vcpu_vmx {
>>>>>> 
>>>>>>  	bool rdtscp_enabled;
>>>>>> +	u8 eoi_exitmap_changed;
>>>>>> +	u32 eoi_exit_bitmap[8];
>>>>>> +
>>>>>>  	/* Support for a guest hypervisor (nested VMX) */
>>>>>>  	struct nested_vmx nested;
>>>>>>  };
>>>>>> @@ -770,6 +776,12 @@ static inline bool
>>>>> cpu_has_vmx_apic_register_virt(void)
>>>>>>  		SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>>>>  }
>>>>>> +static inline bool cpu_has_vmx_virtual_intr_delivery(void)
>>>>>> +{
>>>>>> +	return vmcs_config.cpu_based_2nd_exec_ctrl &
>>>>>> +		SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>>>> +}
>>>>>> +
>>>>>>  static inline bool cpu_has_vmx_flexpriority(void)
>>>>>>  {
>>>>>>  	return cpu_has_vmx_tpr_shadow() &&
>>>>>> @@ -2508,7 +2520,8 @@ static __init int setup_vmcs_config(struct
>>>>> vmcs_config *vmcs_conf)
>>>>>>  			SECONDARY_EXEC_PAUSE_LOOP_EXITING |
>>>>>>  			SECONDARY_EXEC_RDTSCP |
>>>>>>  			SECONDARY_EXEC_ENABLE_INVPCID |
>>>>>> -			SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>>>> +			SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>>>>> +			SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>>>>  		if (adjust_vmx_controls(min2, opt2,
>>>>>>  					MSR_IA32_VMX_PROCBASED_CTLS2,
>>>>>>  					&_cpu_based_2nd_exec_control) < 0)
>>>>>> @@ -2522,7 +2535,8 @@ static __init int setup_vmcs_config(struct
>>>>>> vmcs_config *vmcs_conf)
>>>>>> 
>>>>>>  	if (!(_cpu_based_exec_control & CPU_BASED_TPR_SHADOW))
>>>>>>  		_cpu_based_2nd_exec_control &= ~(
>>>>>> -				SECONDARY_EXEC_APIC_REGISTER_VIRT);
>>>>>> +				SECONDARY_EXEC_APIC_REGISTER_VIRT |
>>>>>> +				SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
>>>>>> 
>>>>>>  	if (_cpu_based_2nd_exec_control & SECONDARY_EXEC_ENABLE_EPT) {
>>>>>>  		/* CR3 accesses and invlpg don't need to cause VM Exits when EPT
>>>>>>  @@ -2724,6 +2738,14 @@ static __init int hardware_setup(void) 	if
>>>>>>  (!cpu_has_vmx_apic_register_virt()) 		enable_apicv_reg = 0;
>>>>>> +	if (!cpu_has_vmx_virtual_intr_delivery())
>>>>>> +		enable_apicv_vid = 0;
>>>>>> +
>>>>>> +	if (!enable_apicv_vid) {
>>>>>> +		kvm_x86_ops->update_irq = NULL;
>>>>> Why setting it to NULL? Either drop this since vmx_update_irq() checks
>>>>> enable_apicv_vid or better set it to function that does nothing and
>>>>> drop enable_apicv_vid check in vmx_update_irq(). Since
>>>>> kvm_x86_ops->update_irq will never be NULL you can drop the check
>>>>> before calling it.
>>>> Sure.
>>>> 
>>>>>> +		kvm_x86_ops->update_cr8_intercept = NULL;
>>>>> Why? It should be other way around: if apicv is enabled set
>>>>> update_cr8_intercept callback to NULL.
>>>> Yes, this is wrong.
>>> Please test the patches with vid disabled and Windows guests. This bug
>>> should have prevented it from working.
>>> 
>>>> 
>>>>>> +	}
>>>>>> +
>>>>>>  	if (nested)
>>>>>>  		nested_vmx_setup_ctls_msrs();
>>>>>> @@ -3839,6 +3861,8 @@ static u32 vmx_secondary_exec_control(struct
>>>>> vcpu_vmx *vmx)
>>>>>>  		exec_control &= ~SECONDARY_EXEC_PAUSE_LOOP_EXITING;
>>>>>>  	if (!enable_apicv_reg)
>>>>>>  		exec_control &= ~SECONDARY_EXEC_APIC_REGISTER_VIRT;
>>>>>> +	if (!enable_apicv_vid)
>>>>>> +		exec_control &= ~SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY;
>>>>>>  	return exec_control;
>>>>>>  }
>>>>>> @@ -3883,6 +3907,15 @@ static int vmx_vcpu_setup(struct vcpu_vmx
>>> *vmx)
>>>>>>  				vmx_secondary_exec_control(vmx));
>>>>>>  	}
>>>>>> +	if (enable_apicv_vid) {
>>>>>> +		vmcs_write64(EOI_EXIT_BITMAP0, 0);
>>>>>> +		vmcs_write64(EOI_EXIT_BITMAP1, 0);
>>>>>> +		vmcs_write64(EOI_EXIT_BITMAP2, 0);
>>>>>> +		vmcs_write64(EOI_EXIT_BITMAP3, 0);
>>>>>> +
>>>>>> +		vmcs_write16(GUEST_INTR_STATUS, 0);
>>>>>> +	}
>>>>>> +
>>>>>>  	if (ple_gap) {
>>>>>>  		vmcs_write32(PLE_GAP, ple_gap);
>>>>>>  		vmcs_write32(PLE_WINDOW, ple_window);
>>>>>> @@ -4806,6 +4839,16 @@ static int handle_apic_access(struct kvm_vcpu
>>>>> *vcpu)
>>>>>>  	return emulate_instruction(vcpu, 0) == EMULATE_DONE;
>>>>>>  }
>>>>>> +static int handle_apic_eoi_induced(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>>>>>> +	int vector = exit_qualification & 0xff;
>>>>>> +
>>>>>> +	/* EOI-induced VM exit is trap-like and thus no need to adjust IP */
>>>>>> +	kvm_apic_set_eoi_accelerated(vcpu, vector);
>>>>>> +	return 1;
>>>>>> +}
>>>>>> +
>>>>>>  static int handle_apic_write(struct kvm_vcpu *vcpu)
>>>>>>  {
>>>>>>  	unsigned long exit_qualification = vmcs_readl(EXIT_QUALIFICATION);
>>>>>> @@ -5755,6 +5798,7 @@ static int (*const
> kvm_vmx_exit_handlers[])(struct
>>>>> kvm_vcpu *vcpu) = {
>>>>>>  	[EXIT_REASON_TPR_BELOW_THRESHOLD]     =
>>>>>>  handle_tpr_below_threshold, 	[EXIT_REASON_APIC_ACCESS]            
>>>>>>  = handle_apic_access, 	[EXIT_REASON_APIC_WRITE]              =
>>>>>>  handle_apic_write, +	[EXIT_REASON_EOI_INDUCED]             =
>>>>>>  handle_apic_eoi_induced, 	[EXIT_REASON_WBINVD]                  =
>>>>>>  handle_wbinvd, 	[EXIT_REASON_XSETBV]                  =
>>>>>>  handle_xsetbv, 	[EXIT_REASON_TASK_SWITCH]             =
>>>>>>  handle_task_switch,
>>>>>> @@ -6096,6 +6140,11 @@ static int vmx_handle_exit(struct kvm_vcpu
>>>>>> *vcpu)
>>>>>> 
>>>>>>  static void update_cr8_intercept(struct kvm_vcpu *vcpu, int tpr, int irr)
>>>>>>  {
>>>>>> +	/* no need for tpr_threshold update if APIC virtual
>>>>>> +	 * interrupt delivery is enabled */
>>>>>> +	if (!enable_apicv_vid)
>>>>>> +		return ;
>>>>>> +
>>>>> Since you (will) set ->update_cr8_intercept callback to NULL if vid
>>>>> is enabled this function will never be called with !enable_apicv_vid,
>>>>> so this check can be dropped.
>>>> Ok.
>>>> 
>>>>>>  	if (irr == -1 || tpr < irr) {
>>>>>>  		vmcs_write32(TPR_THRESHOLD, 0);
>>>>>>  		return;
>>>>>> @@ -6104,6 +6153,90 @@ static void update_cr8_intercept(struct
>>> kvm_vcpu
>>>>> *vcpu, int tpr, int irr)
>>>>>>  	vmcs_write32(TPR_THRESHOLD, irr);
>>>>>>  }
>>>>>> +static int vmx_has_virtual_interrupt_delivery(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	return irqchip_in_kernel(vcpu->kvm) && enable_apicv_vid;
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_update_eoi_exitmap(struct vcpu_vmx *vmx, int index)
>>>>>> +{
>>>>>> +	int tmr;
>>>>>> +	tmr = kvm_apic_get_reg(vmx->vcpu.arch.apic,
>>>>>> +			APIC_TMR + 0x10 * index);
>>>>>> +	vmcs_write32(EOI_EXIT_BITMAP0 + index,
>>>>>> +			vmx->eoi_exit_bitmap[index] | tmr);
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_update_rvi(int vector)
>>>>>> +{
>>>>>> +	u16 status;
>>>>>> +	u8 old;
>>>>>> +
>>>>>> +	status = vmcs_read16(GUEST_INTR_STATUS);
>>>>>> +	old = (u8)status & 0xff;
>>>>>> +	if ((u8)vector != old) {
>>>>>> +		status &= ~0xff;
>>>>>> +		status |= (u8)vector;
>>>>>> +		vmcs_write16(GUEST_INTR_STATUS, status);
>>>>>> +	}
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_update_irq(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	int vector;
>>>>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>>>> +
>>>>>> +	if (!enable_apicv_vid)
>>>>>> +		return ;
>>>>>> +
>>>>>> +	vector = kvm_apic_get_highest_irr(vcpu);
>>>>>> +	if (vector == -1)
>>>>>> +		return;
>>>>>> +
>>>>>> +	vmx_update_rvi(vector);
>>>>>> +
>>>>>> +	if (vmx->eoi_exitmap_changed) {
>>>>>> +		int index;
>>>>>> +		for_each_set_bit(index,
>>>>>> +				(unsigned long *)(&vmx->eoi_exitmap_changed), 8)
>>>>>> +			vmx_update_eoi_exitmap(vmx, index);
>>>>>> +		vmx->eoi_exitmap_changed = 0;
>>>>>> +	}
>>>>>> +}
>>>>>> +
>>>>>> +static void vmx_set_eoi_exitmap(struct kvm_vcpu *vcpu,
>>>>>> +				int vector, int trig_mode,
>>>>>> +				int always_set)
>>>>>> +{
>>>>>> +	struct vcpu_vmx *vmx = to_vmx(vcpu);
>>>>>> +	int index, offset, changed;
>>>>>> +	struct kvm_lapic *apic;
>>>>>> +
>>>>>> +	if (!enable_apicv_vid)
>>>>>> +		return ;
>>>>>> +
>>>>>> +	if (WARN_ONCE((vector < 0) || (vector > 255),
>>>>>> +		"KVM VMX: vector (%d) out of range\n", vector))
>>>>>> +		return;
>>>>>> +
>>>>>> +	apic = vcpu->arch.apic;
>>>>>> +	index = vector >> 5;
>>>>>> +	offset = vector & 31;
>>>>>> +
>>>>>> +	if (always_set)
>>>>>> +		changed = !test_and_set_bit(offset,
>>>>>> +				(unsigned long *)&vmx->eoi_exit_bitmap);
>>>>>> +	else if (trig_mode)
>>>>>> +		changed = !test_bit(offset,
>>>>>> +				apic->regs + APIC_TMR + index * 0x10);
>>>>>> +	else
>>>>>> +		changed = test_bit(offset,
>>>>>> +				apic->regs + APIC_TMR + index * 0x10);
>>>>>> +
>>>>>> +	if (changed)
>>>>>> +		vmx->eoi_exitmap_changed |= 1 << index;
>>>>>> +}
>>>>>> +
>>>>>>  static void vmx_complete_atomic_exit(struct vcpu_vmx *vmx) { 	u32
>>>>>>  exit_intr_info; @@ -7364,6 +7497,9 @@ static struct kvm_x86_ops
>>>>>>  vmx_x86_ops = { 	.enable_nmi_window = enable_nmi_window,
>>>>>>  	.enable_irq_window = enable_irq_window, 	.update_cr8_intercept =
>>>>>>  update_cr8_intercept,
>>>>>> +	.has_virtual_interrupt_delivery = vmx_has_virtual_interrupt_delivery,
>>>>>> +	.update_irq = vmx_update_irq,
>>>>>> +	.set_eoi_exitmap = vmx_set_eoi_exitmap,
>>>>>> 
>>>>>>  	.set_tss_addr = vmx_set_tss_addr,
>>>>>>  	.get_tdp_level = get_ept_level,
>>>>>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index
>>>>>> b0b8abe..02fe194 100644 --- a/arch/x86/kvm/x86.c +++
>>>>>> b/arch/x86/kvm/x86.c @@ -164,6 +164,14 @@ static int
>>>>>> emulator_fix_hypercall(struct x86_emulate_ctxt *ctxt);
>>>>>> 
>>>>>>  static int kvm_vcpu_reset(struct kvm_vcpu *vcpu);
>>>>>> +static inline bool kvm_apic_vid_enabled(struct kvm_vcpu *vcpu)
>>>>>> +{
>>>>>> +	if (kvm_x86_ops->has_virtual_interrupt_delivery)
>>>>> This callback is never NULL.
>>>> Ok.
>>>> 
>>>>>> +		return kvm_x86_ops->has_virtual_interrupt_delivery(vcpu);
>>>>>> +
>>>>>> +	return 0;
>>>>>> +}
>>>>>> +
>>>>>>  static inline void kvm_async_pf_hash_reset(struct kvm_vcpu *vcpu)
>>>>>>  {
>>>>>>  	int i;
>>>>>> @@ -5533,12 +5541,20 @@ static void inject_pending_event(struct
>>> kvm_vcpu
>>>>> *vcpu)
>>>>>>  			vcpu->arch.nmi_injected = true;
>>>>>>  			kvm_x86_ops->set_nmi(vcpu);
>>>>>>  		}
>>>>>> -	} else if (kvm_cpu_has_interrupt(vcpu)) {
>>>>>> -		if (kvm_x86_ops->interrupt_allowed(vcpu)) {
>>>>>> -			kvm_queue_interrupt(vcpu, kvm_cpu_get_interrupt(vcpu),
>>>>>> -					    false);
>>>>>> +	} else if (kvm_cpu_has_interrupt(vcpu) &&
>>>>>> +			kvm_x86_ops->interrupt_allowed(vcpu)) {
>>>>>> +		int vector = -1;
>>>>>> +
>>>>>> +		if (kvm_apic_vid_enabled(vcpu))
>>>>>> +			vector = kvm_cpu_get_extint(vcpu);
>>>>>> +		else
>>>>>> +			vector = kvm_cpu_get_interrupt(vcpu);
>>>>>> +
>>>>>> +		if (vector != -1) {
>>>>>> +			kvm_queue_interrupt(vcpu, vector, false);
>>>>>>  			kvm_x86_ops->set_irq(vcpu);
>>>>>>  		}
>>>>> If vid is enabled kvm_cpu_has_interrupt() should return true only if there
>>>>> is extint interrupt. Similarly kvm_cpu_get_interrupt() will only return
>>>>> extint if vid is enabled. This basically moves kvm_apic_vid_enabled()
>>>>> logic deeper into kvm_cpu_(has|get)_interrupt() functions instead
>>>>> of changing interrupt injection logic here and in vcpu_enter_guest()
>>>>> bellow. We still need kvm_cpu_has_interrupt() variant that always checks
>>>>> both extint and apic for use in kvm_arch_vcpu_runnable() though.
>>>> As you mentioned, we still need to checks both extint and apic interrupt in
>>> some case. So how to do this? Introduce another argument to indicate
>>> whether check both? Yes, we need to check both in
>>> kvm_arch_vcpu_runnable(). Another argument is good option. We can have
>>> two functions: kvm_cpu_has_injectable_interrupt() for use in irq
>>> injection path and kvm_cpu_has_interrupt() for use in
>>> kvm_arch_vcpu_runnable(). They will call common one with additional
>>> argument to avoid code duplication.
>> Ok. will follow this way.
>> 
>>>> 
>>>>>> +
>>>>>>  	}
>>>>>>  }
>>>>>> @@ -5657,12 +5673,20 @@ static int vcpu_enter_guest(struct kvm_vcpu
>>>>> *vcpu)
>>>>>>  	}
>>>>>>  
>>>>>>  	if (kvm_check_request(KVM_REQ_EVENT, vcpu) || req_int_win) {
>>>>>> +		/* update archtecture specific hints for APIC
>>>>>> +		 * virtual interrupt delivery */
>>>>>> +		if (kvm_x86_ops->update_irq)
>>>>>> +			kvm_x86_ops->update_irq(vcpu);
>>>>>> +
>>>>> 
>>>>> I do not see why this have to be here instead of inside if
>>>>> (kvm_lapic_enabled(vcpu)){} near update_cr8_intercept() a couple of
>>>>> lines bellow. If you move it there you can drop apic enable check in
>>>>> kvm_apic_get_highest_irr().
>>>> Yes, it seems ok to move it.
>>>> 
>>>>>>  		inject_pending_event(vcpu);
>>>>>>  
>>>>>>  		/* enable NMI/IRQ window open exits if needed */
>>>>>>  		if (vcpu->arch.nmi_pending)
>>>>>>  			kvm_x86_ops->enable_nmi_window(vcpu);
>>>>>> -		else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>>>>> +		else if (kvm_apic_vid_enabled(vcpu)) {
>>>>>> +			if (kvm_cpu_has_extint(vcpu))
>>>>>> +				kvm_x86_ops->enable_irq_window(vcpu);
>>>>>> +		} else if (kvm_cpu_has_interrupt(vcpu) || req_int_win)
>>>>>>  			kvm_x86_ops->enable_irq_window(vcpu);
>>>>>>  
>>>>>>  		if (kvm_lapic_enabled(vcpu)) {
>>>>>> diff --git a/virt/kvm/ioapic.c b/virt/kvm/ioapic.c
>>>>>> index 166c450..898aa62 100644
>>>>>> --- a/virt/kvm/ioapic.c
>>>>>> +++ b/virt/kvm/ioapic.c
>>>>>> @@ -186,6 +186,7 @@ static int ioapic_deliver(struct kvm_ioapic *ioapic,
> int
>>>>> irq)
>>>>>>  		/* need to read apic_id from apic regiest since 		 * it can be
>>>>>>  rewritten */ 		irqe.dest_id = ioapic->kvm->bsp_vcpu_id;
>>>>>>  +		kvm_set_eoi_exitmap(ioapic->kvm->vcpus[0], irqe.vector, 1, 1);
>>>>>>  	} #endif 	return kvm_irq_delivery_to_apic(ioapic->kvm, NULL,
>>>>>>  &irqe);
>>>>>> --
>>>>>> 1.7.1
>>>>> 
>>>>> --
>>>>> 			Gleb.
>>>> 
>>>> 
>>>> Best regards,
>>>> Yang
>>>> 
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe kvm" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>> 
>>> --
>>> 			Gleb.
>> 
>> 
>> Best regards,
>> Yang
>> 
> 
> --
> 			Gleb.


Best regards,
Yang


--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html