Re: [PATCH] KVM: nVMX: Fix direct injection of interrupts from L0 to L2

Gleb Natapov <gleb@xxxxxxxxxx> · Sun, 17 Feb 2013 17:07:21 +0200

On Sat, Feb 16, 2013 at 06:10:14PM +0100, Jan Kiszka wrote:
> From: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
> 
> If L1 does not set PIN_BASED_EXT_INTR_MASK, we incorrectly skipped
> vmx_complete_interrupts on L2 exits. This is required because, with
> direct interrupt injection from L0 to L2, L0 has to update its pending
> events.
> 
> Also, we need to allow vmx_cancel_injection when entering L2 in we left
> to L0. This condition is indirectly derived from the absence of valid
> vectoring info in vmcs12. We no explicitly clear it if we find out that
> the L2 exit is not targeting L1 but L0.
> 
We really need to overhaul how interrupt injection is emulated in nested
VMX. Why not put pending events into event queue instead of
get_vmcs12(vcpu)->idt_vectoring_info_field and inject them in usual way.

> Signed-off-by: Jan Kiszka <jan.kiszka@xxxxxxxxxxx>
> ---
>  arch/x86/kvm/vmx.c |   43 +++++++++++++++++++++++++++----------------
>  1 files changed, 27 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 68a045ae..464b6a5 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -624,6 +624,7 @@ static void vmx_get_segment(struct kvm_vcpu *vcpu,
>  			    struct kvm_segment *var, int seg);
>  static bool guest_state_valid(struct kvm_vcpu *vcpu);
>  static u32 vmx_segment_access_rights(struct kvm_segment *var);
> +static void vmx_complete_interrupts(struct vcpu_vmx *vmx);
>  
>  static DEFINE_PER_CPU(struct vmcs *, vmxarea);
>  static DEFINE_PER_CPU(struct vmcs *, current_vmcs);
> @@ -6213,9 +6214,19 @@ static int vmx_handle_exit(struct kvm_vcpu *vcpu)
>  	else
>  		vmx->nested.nested_run_pending = 0;
>  
> -	if (is_guest_mode(vcpu) && nested_vmx_exit_handled(vcpu)) {
> -		nested_vmx_vmexit(vcpu);
> -		return 1;
> +	if (is_guest_mode(vcpu)) {
> +		if (nested_vmx_exit_handled(vcpu)) {
> +			nested_vmx_vmexit(vcpu);
> +			return 1;
> +		}
> +		/*
> +		 * Now it's clear, we are leaving to L0. Perform the postponed
> +		 * interrupt completion and clear L1's vectoring info field so
> +		 * that we do not overwrite what L0 wants to inject on
> +		 * re-entry.
> +		 */
> +		vmx_complete_interrupts(vmx);
> +		get_vmcs12(vcpu)->idt_vectoring_info_field = 0;
>  	}
>  
>  	if (exit_reason & VMX_EXIT_REASONS_FAILED_VMENTRY) {
> @@ -6495,8 +6506,6 @@ static void __vmx_complete_interrupts(struct vcpu_vmx *vmx,
>  
>  static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
>  {
> -	if (is_guest_mode(&vmx->vcpu))
> -		return;
>  	__vmx_complete_interrupts(vmx, vmx->idt_vectoring_info,
>  				  VM_EXIT_INSTRUCTION_LEN,
>  				  IDT_VECTORING_ERROR_CODE);
> @@ -6504,7 +6513,9 @@ static void vmx_complete_interrupts(struct vcpu_vmx *vmx)
>  
>  static void vmx_cancel_injection(struct kvm_vcpu *vcpu)
>  {
> -	if (is_guest_mode(vcpu))
> +	if (is_guest_mode(vcpu) &&
> +	    get_vmcs12(vcpu)->idt_vectoring_info_field &
> +			VECTORING_INFO_VALID_MASK)
Why skip cancel_injection at all? As far as I see we can lose injected
irq if we do. Consider:

  io thread                                  vcpu in nested mode
set irr 200
                                          clear irr 200 set isr 200
                                          set 200 in VM_ENTRY_INTR_INFO_FIELD
set irr 250
set KVM_REQ_EVENT
                                          if (KVM_REQ_EVENT)
                                                  vmx_cancel_injection() <- does nothing

                                          clear irr 250 set isr 250
                                          set 250 in VM_ENTRY_INTR_INFO_FIELD
                                          vmentry

So now APIC state is bogus. isr bit 200 is set but vector 200 was never
injected and actually is lost forever. Next EOI will clear isr 250 and
isr 200 will block all lower level interrupt forever.

--
			Gleb.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html