On 19/11/2017 17:25, Liran Alon wrote: > L2 RDMSR could be handled as described below: > 1) L2 exits to L0 on RDMSR and calls handle_rdmsr(). > 2) handle_rdmsr() calls kvm_inject_gp() which sets > KVM_REQ_EVENT, exception.pending=true and exception.injected=false. > 3) vcpu_enter_guest() consumes KVM_REQ_EVENT and calls > inject_pending_event() which calls vmx_check_nested_events() > which sees that exception.pending=true but > nested_vmx_check_exception() returns 0 and therefore does nothing at > this point. However let's assume it later sees vmx-preemption-timer > expired and therefore exits from L2 to L1 by calling > nested_vmx_vmexit().> 4) nested_vmx_vmexit() calls prepare_vmcs12() > which calls vmcs12_save_pending_event() but it does nothing as > exception.injected is false. Also prepare_vmcs12() calls > kvm_clear_exception_queue() which does nothing as > exception.injected is already false. > 5) We now return from vmx_check_nested_events() with 0 while still > having exception.pending=true! > 6) Therefore inject_pending_event() continues > and we inject L2 exception to L1!... But this is buggy as well, because the #GP is lost, isn't it? Is the bug that the preemption timer vmexit should only be injected if there are no pending exceptions? In fact, the same bug could also happened for NMIs or interrupts, I think. So, something like (101% untested): diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c index 5b436f4e6e3e..64eecb8b126d 100644 --- a/arch/x86/kvm/vmx.c +++ b/arch/x86/kvm/vmx.c @@ -11137,8 +11137,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) bool block_nested_events = vmx->nested.nested_run_pending || kvm_event_needs_reinjection(vcpu); - if (vcpu->arch.exception.pending && - nested_vmx_check_exception(vcpu, &exit_qual)) { + if (vcpu->arch.exception.pending) { + if (!nested_vmx_check_exception(vcpu, &exit_qual)) + return 0; if (block_nested_events) return -EBUSY; nested_vmx_inject_exception_vmexit(vcpu, exit_qual); @@ -11146,15 +11147,9 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) return 0; } - if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) && - vmx->nested.preemption_timer_expired) { - if (block_nested_events) - return -EBUSY; - nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0); - return 0; - } - - if (vcpu->arch.nmi_pending && nested_exit_on_nmi(vcpu)) { + if (vcpu->arch.nmi_pending) { + if (!nested_exit_on_nmi(vcpu)) + return 0; if (block_nested_events) return -EBUSY; nested_vmx_vmexit(vcpu, EXIT_REASON_EXCEPTION_NMI, @@ -11169,14 +11164,23 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) return 0; } - if ((kvm_cpu_has_interrupt(vcpu) || external_intr) && - nested_exit_on_intr(vcpu)) { + if (kvm_cpu_has_interrupt(vcpu) || external_intr) { + if (!nested_exit_on_intr(vcpu)) + return 0; if (block_nested_events) return -EBUSY; nested_vmx_vmexit(vcpu, EXIT_REASON_EXTERNAL_INTERRUPT, 0, 0); return 0; } + if (nested_cpu_has_preemption_timer(get_vmcs12(vcpu)) && + vmx->nested.preemption_timer_expired) { + if (block_nested_events) + return -EBUSY; + nested_vmx_vmexit(vcpu, EXIT_REASON_PREEMPTION_TIMER, 0, 0); + return 0; + } + vmx_complete_nested_posted_interrupt(vcpu); return 0; } Paolo > This commit will fix above issue by changing step (4) to > clear exception.pending in kvm_clear_exception_queue(). > > Fixes: 664f8e26b00c ("KVM: X86: Fix loss of exception which has not > yet been injected") > > Signed-off-by: Liran Alon <liran.alon@xxxxxxxxxx> > Reviewed-by: Nikita Leshenko <nikita.leshchenko@xxxxxxxxxx> > Reviewed-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx> > Signed-off-by: Krish Sadhukhan <krish.sadhukhan@xxxxxxxxxx> > --- > arch/x86/kvm/vmx.c | 1 - > arch/x86/kvm/x86.h | 1 + > 2 files changed, 1 insertion(+), 1 deletion(-) > > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 7c3522a989d0..bee08782c781 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -11035,7 +11035,6 @@ static int vmx_check_nested_events(struct kvm_vcpu *vcpu, bool external_intr) > if (vmx->nested.nested_run_pending) > return -EBUSY; > nested_vmx_inject_exception_vmexit(vcpu, exit_qual); > - vcpu->arch.exception.pending = false; > return 0; > } > > diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h > index d0b95b7a90b4..6d112d8f799c 100644 > --- a/arch/x86/kvm/x86.h > +++ b/arch/x86/kvm/x86.h > @@ -12,6 +12,7 @@ > > static inline void kvm_clear_exception_queue(struct kvm_vcpu *vcpu) > { > + vcpu->arch.exception.pending = false; > vcpu->arch.exception.injected = false; > } > > Should kvm_clear_interrupt_queue do the same? Thanks, Paolo