On Fri, Jun 02, 2023, Alexey Kardashevskiy wrote: > Sean, ping? > > I wonder if this sev-es-not-singlestepping is a showstopper or it is alright > to repost this patchset without it? Thanks, Ah, shoot, I completely lost this in my inbox. Sorry :-/ > > > Side topic, isn't there an existing bug regarding SEV-ES NMI windows? > > > KVM can't actually single-step an SEV-ES guest, but tries to set > > > RFLAGS.TF anyways. > > > > Why is it a "bug" and what does the patch fix? Sound to me as it is > > pointless and the guest won't do single stepping and instead will run > > till it exits somehow, what do I miss? The bug is benign in the end, but it's still a bug. I'm not worried about fixing any behavior, but I dislike having dead, misleading code, especially for something like this where both NMI virtualization and SEV-ES are already crazy complex and subtle. I think it's safe to say that I've spent more time digging through SEV-ES and NMI virtualization than most KVM developers, and as evidenced by the number of things I got wrong below, I'm still struggling to keep track of the bigger picture. Developers that are new to all of this need as much help as they can get. > > > Blech, and suppressing EFER.SVME in efer_trap() is a bit gross, > > > > Why suppressed? svm_set_efer() sets it eventually anyway. svm_set_efer() sets SVME in hardware, but KVM's view of the guest's value that's stored in vcpu->arch.efer doesn't have SVME set. E.g. from the guest's perspective, EFER.SVME will have "Reserved Read As Zero" semantics. > > > but I suppose since the GHCB doesn't allow for CLGI or STGI it's "fine". > > > > GHCB does not mention this, instead these are always intercepted in > > init_vmcb(). Right, I'm calling out that the absense of protocol support for requesting CLGI or STGI emulation means dropping the guest's EFER.SVME is ok (though gross :-) ). > > > E.g. shouldn't KVM do this? > > > > It sure can and I am happy to include this into the series, the commit > > log is what I am struggling with :) > > > > > > > > diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c > > > index ca32389f3c36..4e4a49031efe 100644 > > > --- a/arch/x86/kvm/svm/svm.c > > > +++ b/arch/x86/kvm/svm/svm.c > > > @@ -3784,6 +3784,16 @@ static void svm_enable_nmi_window(struct > > > kvm_vcpu *vcpu) > > > �������� if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion) > > > ���������������� return; /* IRET will cause a vm exit */ > > > +������ /* > > > +������� * KVM can't single-step SEV-ES guests and instead assumes > > > that IRET > > > +������� * in the guest will always succeed, > > > > It relies on GHCB's NMI_COMPLETE (which SVM than handles is it was IRET): > > > > ������� case SVM_VMGEXIT_NMI_COMPLETE: > > ��������������� ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET); > > ��������������� break; Ah, right, better to say that the guest is responsible for signaling that it's ready to accept NMIs, which KVM handles by "emulating" IRET. > > > i.e. clears NMI masking on the > > > +������� * next VM-Exit.� Note, GIF is guaranteed to be '1' for > > > SEV-ES guests > > > +������� * as the GHCB doesn't allow for CLGI or STGI (and KVM suppresses > > > +������� * EFER.SVME for good measure, see efer_trap()). > > > > SVM KVM seems to not enforce EFER.SVME, the guest does what it wants and > > KVM is only told the new value via EFER_WRITE_TRAP. And "writes by > > SEV-ES guests to EFER.SVME are always ignored by hardware" says the APM. Ahhh, that blurb in the APM is what I'm missing. Actually, there's a real bug here. KVM doesn't immediately unmask NMIs in response to NMI_COMPLETE, and instead goes through the whole awaiting_iret_completion => svm_complete_interrupts(), which means that KVM doesn't unmask NMIs until the *next* VM-Exit. Theoretically, that could be never, e.g. if the host is tickless and the guest is configured to busy wait idle CPUs. Attached patches are compile tested only.
>From eb126f1c02b418df0b5dce9e3cdbd984fc4b0611 Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc@xxxxxxxxxx> Date: Tue, 13 Jun 2023 16:08:18 -0700 Subject: [PATCH 1/2] KVM: SVM: Don't defer NMI unblocking until next exit for SEV-ES guests Immediately mark NMIs as unmasked in response to #VMGEXIT(NMI complete) instead of setting awaiting_iret_completion and waiting until the *next* VM-Exit to unmask NMIs. The whole point of "NMI complete" is that the guest is responsible for telling the hypervisor when it's safe to inject an NMI, i.e. there's no need to wait. And because there's no IRET to single-step, the next VM-Exit could be a long time coming, i.e. KVM could incorrectly hold an NMI pending for far longer than what is required and expected. Opportunistically fix a stale reference to HF_IRET_MASK. Fixes: 4444dfe4050b ("KVM: SVM: Add NMI support for an SEV-ES guest") Cc: Tom Lendacky <thomas.lendacky@xxxxxxx> Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- arch/x86/kvm/svm/sev.c | 5 ++++- arch/x86/kvm/svm/svm.c | 10 +++++----- 2 files changed, 9 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/svm/sev.c b/arch/x86/kvm/svm/sev.c index d65578d8784d..9a0e74cb6cb9 100644 --- a/arch/x86/kvm/svm/sev.c +++ b/arch/x86/kvm/svm/sev.c @@ -2887,7 +2887,10 @@ int sev_handle_vmgexit(struct kvm_vcpu *vcpu) svm->sev_es.ghcb_sa); break; case SVM_VMGEXIT_NMI_COMPLETE: - ret = svm_invoke_exit_handler(vcpu, SVM_EXIT_IRET); + ++vcpu->stat.nmi_window_exits; + svm->nmi_masked = false; + kvm_make_request(KVM_REQ_EVENT, vcpu); + ret = 1; break; case SVM_VMGEXIT_AP_HLT_LOOP: ret = kvm_emulate_ap_reset_hold(vcpu); diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index b29d0650582e..b284706edde2 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -2508,12 +2508,13 @@ static int iret_interception(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); + WARN_ON_ONCE(sev_es_guest(vcpu->kvm)); + ++vcpu->stat.nmi_window_exits; svm->awaiting_iret_completion = true; svm_clr_iret_intercept(svm); - if (!sev_es_guest(vcpu->kvm)) - svm->nmi_iret_rip = kvm_rip_read(vcpu); + svm->nmi_iret_rip = kvm_rip_read(vcpu); kvm_make_request(KVM_REQ_EVENT, vcpu); return 1; @@ -3916,12 +3917,11 @@ static void svm_complete_interrupts(struct kvm_vcpu *vcpu) svm->soft_int_injected = false; /* - * If we've made progress since setting HF_IRET_MASK, we've + * If we've made progress since setting awaiting_iret_completion, we've * executed an IRET and can allow NMI injection. */ if (svm->awaiting_iret_completion && - (sev_es_guest(vcpu->kvm) || - kvm_rip_read(vcpu) != svm->nmi_iret_rip)) { + kvm_rip_read(vcpu) != svm->nmi_iret_rip) { svm->awaiting_iret_completion = false; svm->nmi_masked = false; kvm_make_request(KVM_REQ_EVENT, vcpu); base-commit: 5e74470e279654d9fa8742184c8c89837b899078 -- 2.41.0.162.gfafddb0af9-goog
>From fe7634942b49a243ec42ca1aaa8b9354c126b2a3 Mon Sep 17 00:00:00 2001 From: Sean Christopherson <seanjc@xxxxxxxxxx> Date: Tue, 13 Jun 2023 15:50:44 -0700 Subject: [PATCH 2/2] KVM: SVM: Don't try to pointlessly single-step SEV-ES guests for NMI window Bail early from svm_enable_nmi_window() for SEV-ES guests without trying to enable single-step of the guest, as single-stepping an SEV-ES guest is impossible and the guest is responsible for *telling* KVM when it is ready for an new NMI to be injected. Functionally, setting TF and RF in svm->vmcb->save.rflags is benign as the field is ignored by hardware, but it's all kinds of confusing. Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> --- arch/x86/kvm/svm/svm.c | 14 ++++++++++++++ 1 file changed, 14 insertions(+) diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c index b284706edde2..06d50c9c1e48 100644 --- a/arch/x86/kvm/svm/svm.c +++ b/arch/x86/kvm/svm/svm.c @@ -3768,6 +3768,20 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu) if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion) return; /* IRET will cause a vm exit */ + /* + * SEV-ES guests are responsible for signaling when a vCPU is ready to + * receive a new NMI, as SEV-ES guests can't be single-stepped, i.e. + * KVM can't intercept and single-step IRET to detect when NMIs are + * unblocked (architecturally speaking). See SVM_VMGEXIT_NMI_COMPLETE. + * + * Note, GIF is guaranteed to be '1' for SEV-ES guests as hardware + * ignores SEV-ES guest writes to EFER.SVME, KVM suppresses EFER.SVME + * (see efer_trap()), *and* CLGI/STGI are not supported NAEs in the + * GHCB protocol. + */ + if (sev_es_guest(vcpu->kvm)) + return; + if (!gif_set(svm)) { if (vgif) svm_set_intercept(svm, INTERCEPT_STGI); -- 2.41.0.162.gfafddb0af9-goog