2017-06-20 09:41+0200, Ladi Prosek: > On Mon, Jun 19, 2017 at 6:17 PM, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote: > > 2017-06-19 15:05+0200, Ladi Prosek: > >> On Mon, Jun 19, 2017 at 2:50 PM, Ladi Prosek <lprosek@xxxxxxxxxx> wrote: > >> > On Fri, Jun 16, 2017 at 3:26 PM, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote: > >> >> 2017-06-15 13:20+0200, Ladi Prosek: > >> >>> Nested hypervisor should not see singlestep VM exits if singlestepping > >> >>> was enabled internally by KVM. Windows is particularly sensitive to this > >> >>> and known to bluescreen on unexpected VM exits. > >> >>> > >> >>> Signed-off-by: Ladi Prosek <lprosek@xxxxxxxxxx> > >> >>> --- > >> >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > >> >>> @@ -966,9 +967,13 @@ static void svm_disable_lbrv(struct vcpu_svm *svm) > >> >>> static void disable_nmi_singlestep(struct vcpu_svm *svm) > >> >>> { > >> >>> svm->nmi_singlestep = false; > >> >>> - if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) > >> >>> - svm->vmcb->save.rflags &= > >> >>> - ~(X86_EFLAGS_TF | X86_EFLAGS_RF); > >> >>> + if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) { > >> >>> + /* Clear our flags if they were not set by the guest */ > >> >>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF)) > >> >>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_TF; > >> >>> + if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF)) > >> >>> + svm->vmcb->save.rflags &= ~X86_EFLAGS_RF; > >> >> > >> >> IIUC, we intercept/fault on IRET, disable the interception, set TF+RF > >> >> and enter again, the CPU executes IRET and then we get a #DB exit. > >> >> > >> >> IRET pops EFLAGS from before the NMI -- doesn't the CPU properly restore > >> >> EFLAGS, so we do not need this part here? > >> > > >> > My test VM doesn't work without this part, even with the change that > >> > Paolo suggested in 0/4: > >> > > >> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > >> > index d1efe2c62b3f..15a2f7f8e539 100644 > >> > --- a/arch/x86/kvm/svm.c > >> > +++ b/arch/x86/kvm/svm.c > >> > @@ -4622,6 +4622,9 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu) > >> > if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK)) > >> > == HF_NMI_MASK) > >> > return; /* IRET will cause a vm exit */ > >> > + if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_GIF_MASK)) > >> > + == HF_NMI_MASK) > >> > + return; /* STGI will cause a vm exit */ > >> > > >> > > >> > Let's see what we're singlestepping over (Windows Server 2016 with Hyper-V). > >> > > >> > 1. The very first time we enable NMI singlestep is when running > >> > nested. The nested guest's rip is just after 'pause' in a 'pause; cmp; > >> > jne' loop. svm_nmi_allowed returns false because nested_svm_nmi > >> > returns false - we don't want to inject NMI now because > >> > svm->nested.intercept has the INTERCEPT_NMI bit set. > >> > > >> > 2. Then we find ourselves in L1 with the rip just after 'vmrun'. > >> > svm_nmi_allowed returns false because gif_set returns false (hflags == > >> > HF_HIF_MASK | HF_VINTR_MASK). So we, again, enable NMI singlestep > >> > (without having disabled it yet). > >> > > >> > 3. We singlestep over the instruction immediately following 'vmrun' > >> > (it's a 'mov rax, [rsp+0x20]' in this trace) and finally disable NMI > >> > singlestep on this vcpu. We inject the NMI when the guest executes > >> > stgi. > >> > > >> > > >> > I'll see if I can short this out. Setting the trap flag to step over > >> > an instruction which has no other significance than following a > >> > 'vmrun' is indeed unnecessary. > >> > >> Ok, I think I understand it. > >> > >> First, Paolo's GIF change should be checking only HF_GIF_MASK. Whether > >> we're currently in an NMI or not does not make a difference, NMI is > >> not allowed and we have interception in place for when GIF is set so > >> we don't need to singlestep. > >> > >> Second, enable_nmi_window can also do nothing if > >> svm->nested.exit_required. We're not really going to run the guest in > >> this case so no need to singlestep. > > > > Makes sense. > > > >> With these two tweaks my test VM does not generate any DB exits. We > >> still occasionally set TF if we have an NMI and interrupt pending at > >> the same time (see inject_pending_event where we check > >> vcpu->arch.interrupt.pending, then later vcpu->arch.nmi_pending) but > >> we clear it immediately in the new code added in patch 4. > > > > Right, we only need the single step over IRET and interrupt shadow. > > > > Btw. instead of single-stepping over IRET/interrupt shadow, could we set > > INTERRUPT_SHADOW in VMCB, inject the NMI, and let it execute? > > This mechanism would explain why AMD didn't provide a trap for IRET ... > > > > APM 15.20 says "Injected events are treated in every way as though they > > had occurred normally in the guest", which makes me think that > > INTERRUPT_SHADOW blocks them, if it blocks NMIs at all on AMD. > > Unfortunately this does not seem to work, eventinj.flat fails with > your change. There may be more vmexits before rip finally moves > forward so I have added a bool flag "nmi_int_shadow" and keep setting > SVM_INTERRUPT_SHADOW_MASK on entry as long as the flag is set. The > flag is reset in svm_complete_interrupts which conveniently has the > logic for this already. As far as I can tell, NMI is still injected > before the IRET. > > eventinj.flat output: > > Before NMI IRET test > Sending NMI to self > NMI isr running stack 0x461000 > Sending nested NMI to self > After nested NMI to self > Nested NMI isr running rip=40038e (expecting 400390) > ... At least we know the current code isn't bad for no reason. :) Thanks for trying it out!