On Fri, Jun 16, 2017 at 3:26 PM, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote: > 2017-06-15 13:20+0200, Ladi Prosek: >> Nested hypervisor should not see singlestep VM exits if singlestepping >> was enabled internally by KVM. Windows is particularly sensitive to this >> and known to bluescreen on unexpected VM exits. >> >> Signed-off-by: Ladi Prosek <lprosek@xxxxxxxxxx> >> --- >> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c >> @@ -966,9 +967,13 @@ static void svm_disable_lbrv(struct vcpu_svm *svm) >> static void disable_nmi_singlestep(struct vcpu_svm *svm) >> { >> svm->nmi_singlestep = false; >> - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) >> - svm->vmcb->save.rflags &= >> - ~(X86_EFLAGS_TF | X86_EFLAGS_RF); >> + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) { >> + /* Clear our flags if they were not set by the guest */ >> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF)) >> + svm->vmcb->save.rflags &= ~X86_EFLAGS_TF; >> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF)) >> + svm->vmcb->save.rflags &= ~X86_EFLAGS_RF; > > IIUC, we intercept/fault on IRET, disable the interception, set TF+RF > and enter again, the CPU executes IRET and then we get a #DB exit. > > IRET pops EFLAGS from before the NMI -- doesn't the CPU properly restore > EFLAGS, so we do not need this part here? My test VM doesn't work without this part, even with the change that Paolo suggested in 0/4: diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c index d1efe2c62b3f..15a2f7f8e539 100644 --- a/arch/x86/kvm/svm.c +++ b/arch/x86/kvm/svm.c @@ -4622,6 +4622,9 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu) if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) == HF_NMI_MASK) return; /* IRET will cause a vm exit */ + if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_GIF_MASK)) + == HF_NMI_MASK) + return; /* STGI will cause a vm exit */ Let's see what we're singlestepping over (Windows Server 2016 with Hyper-V). 1. The very first time we enable NMI singlestep is when running nested. The nested guest's rip is just after 'pause' in a 'pause; cmp; jne' loop. svm_nmi_allowed returns false because nested_svm_nmi returns false - we don't want to inject NMI now because svm->nested.intercept has the INTERCEPT_NMI bit set. 2. Then we find ourselves in L1 with the rip just after 'vmrun'. svm_nmi_allowed returns false because gif_set returns false (hflags == HF_HIF_MASK | HF_VINTR_MASK). So we, again, enable NMI singlestep (without having disabled it yet). 3. We singlestep over the instruction immediately following 'vmrun' (it's a 'mov rax, [rsp+0x20]' in this trace) and finally disable NMI singlestep on this vcpu. We inject the NMI when the guest executes stgi. I'll see if I can short this out. Setting the trap flag to step over an instruction which has no other significance than following a 'vmrun' is indeed unnecessary. Thanks! Ladi