Re: [PATCH 2/4] KVM: nSVM: do not forward NMI window singlestep VM exits to L1

Radim Krčmář <rkrcmar@xxxxxxxxxx> · Mon, 19 Jun 2017 18:17:41 +0200

2017-06-19 15:05+0200, Ladi Prosek:
> On Mon, Jun 19, 2017 at 2:50 PM, Ladi Prosek <lprosek@xxxxxxxxxx> wrote:
> > On Fri, Jun 16, 2017 at 3:26 PM, Radim Krčmář <rkrcmar@xxxxxxxxxx> wrote:
> >> 2017-06-15 13:20+0200, Ladi Prosek:
> >>> Nested hypervisor should not see singlestep VM exits if singlestepping
> >>> was enabled internally by KVM. Windows is particularly sensitive to this
> >>> and known to bluescreen on unexpected VM exits.
> >>>
> >>> Signed-off-by: Ladi Prosek <lprosek@xxxxxxxxxx>
> >>> ---
> >>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> >>> @@ -966,9 +967,13 @@ static void svm_disable_lbrv(struct vcpu_svm *svm)
> >>>  static void disable_nmi_singlestep(struct vcpu_svm *svm)
> >>>  {
> >>>       svm->nmi_singlestep = false;
> >>> -     if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP))
> >>> -             svm->vmcb->save.rflags &=
> >>> -                     ~(X86_EFLAGS_TF | X86_EFLAGS_RF);
> >>> +     if (!(svm->vcpu.guest_debug & KVM_GUESTDBG_SINGLESTEP)) {
> >>> +             /* Clear our flags if they were not set by the guest */
> >>> +             if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_TF))
> >>> +                     svm->vmcb->save.rflags &= ~X86_EFLAGS_TF;
> >>> +             if (!(svm->nmi_singlestep_guest_rflags & X86_EFLAGS_RF))
> >>> +                     svm->vmcb->save.rflags &= ~X86_EFLAGS_RF;
> >>
> >> IIUC, we intercept/fault on IRET, disable the interception, set TF+RF
> >> and enter again, the CPU executes IRET and then we get a #DB exit.
> >>
> >> IRET pops EFLAGS from before the NMI -- doesn't the CPU properly restore
> >> EFLAGS, so we do not need this part here?
> >
> > My test VM doesn't work without this part, even with the change that
> > Paolo suggested in 0/4:
> >
> > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> > index d1efe2c62b3f..15a2f7f8e539 100644
> > --- a/arch/x86/kvm/svm.c
> > +++ b/arch/x86/kvm/svm.c
> > @@ -4622,6 +4622,9 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
> >         if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
> >             == HF_NMI_MASK)
> >                 return; /* IRET will cause a vm exit */
> > +       if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_GIF_MASK))
> > +           == HF_NMI_MASK)
> > +               return; /* STGI will cause a vm exit */
> >
> >
> > Let's see what we're singlestepping over (Windows Server 2016 with Hyper-V).
> >
> > 1. The very first time we enable NMI singlestep is when running
> > nested. The nested guest's rip is just after 'pause' in a 'pause; cmp;
> > jne' loop. svm_nmi_allowed returns false because nested_svm_nmi
> > returns false - we don't want to inject NMI now because
> > svm->nested.intercept has the INTERCEPT_NMI bit set.
> >
> > 2. Then we find ourselves in L1 with the rip just after 'vmrun'.
> > svm_nmi_allowed returns false because gif_set returns false (hflags ==
> > HF_HIF_MASK | HF_VINTR_MASK). So we, again, enable NMI singlestep
> > (without having disabled it yet).
> >
> > 3. We singlestep over the instruction immediately following 'vmrun'
> > (it's a 'mov rax, [rsp+0x20]' in this trace) and finally disable NMI
> > singlestep on this vcpu. We inject the NMI when the guest executes
> > stgi.
> >
> >
> > I'll see if I can short this out. Setting the trap flag to step over
> > an instruction which has no other significance than following a
> > 'vmrun' is indeed unnecessary.
> 
> Ok, I think I understand it.
> 
> First, Paolo's GIF change should be checking only HF_GIF_MASK. Whether
> we're currently in an NMI or not does not make a difference, NMI is
> not allowed and we have interception in place for when GIF is set so
> we don't need to singlestep.
> 
> Second, enable_nmi_window can also do nothing if
> svm->nested.exit_required. We're not really going to run the guest in
> this case so no need to singlestep.

Makes sense.

> With these two tweaks my test VM does not generate any DB exits. We
> still occasionally set TF if we have an NMI and interrupt pending at
> the same time (see inject_pending_event where we check
> vcpu->arch.interrupt.pending, then later vcpu->arch.nmi_pending) but
> we clear it immediately in the new code added in patch 4.

Right, we only need the single step over IRET and interrupt shadow.

Btw. instead of single-stepping over IRET/interrupt shadow, could we set
INTERRUPT_SHADOW in VMCB, inject the NMI, and let it execute?
This mechanism would explain why AMD didn't provide a trap for IRET ...

APM 15.20 says "Injected events are treated in every way as though they
had occurred normally in the guest", which makes me think that
INTERRUPT_SHADOW blocks them, if it blocks NMIs at all on AMD.

e.g.

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 6e3095d1bad4..b564613b4e65 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4624,14 +4624,16 @@ static void enable_nmi_window(struct kvm_vcpu *vcpu)
 
 	if ((svm->vcpu.arch.hflags & (HF_NMI_MASK | HF_IRET_MASK))
 	    == HF_NMI_MASK)
-		return; /* IRET will cause a vm exit */
+		return 0; /* IRET will cause a vm exit */
 
-	/*
-	 * Something prevents NMI from been injected. Single step over possible
-	 * problem (IRET or exception injection or interrupt shadow)
-	 */
-	svm->nmi_singlestep = true;
-	svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF);
+	if (svm->vmcb->control.event_inj)
+		return 1; /* will request VM exit immediately after injection */
+
+	--vcpu->arch.nmi_pending;
+	svm_inject_nmi(vcpu);
+	svm->vmcb->control.int_state |= SVM_INTERRUPT_SHADOW_MASK;
+
+	return 0;
 }
 
 static int svm_set_tss_addr(struct kvm *kvm, unsigned int addr)

(Of course we'd want to refactor it for the final patch. :])

> What a complex state machine! I'll prepare v2.

Yes. :/  Looking forward to v2,

thanks.