2018-03-05 09:39-0800, Sean Christopherson: > Clear nested_run_pending in handle_invalid_guest_state() after calling > emulate_instruction(), i.e. after attempting to emulate at least one > instruction. This fixes an issue where L0 enters an infinite loop if > L2 hits an exception that is intercepted by L1 while L0 is emulating > L2's invalid guest state, effectively causing DoS on L1, e.g. the only > way to break the loop is to kill Qemu in L0. > > 1. call handle_invalid_guest_state() for L2 > 2. emulate_instruction() pends an exception, e.g. #UD > 3. L1 intercepts the exception, i.e. nested_vmx_check_exception > returns 1 > 4. vmx_check_nested_events() returns -EBUSY because L1 wants to > intercept the exception and nested_run_pending is true > 5. handle_invalid_guest_state() never makes forward progress for > L2 due to the pending exception > 6. L1 retries VMLAUNCH and VMExits to L0 indefinitely, i.e. the > L1 vCPU trying VMLAUNCH effectively hangs > > Signed-off-by: Sean Christopherson <sean.j.christopherson@xxxxxxxxx> > --- nested_run_pending signals that we have to execute VMRESUME in order to do injection from L2's VMCS (at least VM_ENTRY_INTR_INFO_FIELD). If we don't let the hardware do it, we need to transfer the state from L2's VMCS while doing a nested VM exit for the exception (= behave as if we entered the guest and exited). And I think the actual fix here is to evaluate the interrupt before the first emulate_instruction() in handle_invalid_guest_state(). Do you want to look deeper into this? Thanks. > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 591214843046..3073160e6bae 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -6835,6 +6835,8 @@ static int handle_invalid_guest_state(struct kvm_vcpu *vcpu) > > err = emulate_instruction(vcpu, 0); > > + vmx->nested.nested_run_pending = 0; > + > if (err == EMULATE_USER_EXIT) { > ++vcpu->stat.mmio_exits; > ret = 0; > -- > 2.16.2 >