On 2/1/2023 5:52 AM, Sean Christopherson wrote: > On Tue, Nov 29, 2022, Maxim Levitsky wrote: >> This patch implements support for injecting pending >> NMIs via the .kvm_x86_set_hw_nmi_pending using new AMD's vNMI > > Wrap closer to 75 chars, and please try to be consistent in how your format > things like changelogs and comments. It's jarring as a reader when the wrap > column is constantly changing. > >> feature. > > Please combine the introduction, usage, and implementation of the hew kvm_x86_ops, > i.e. introduce and use the ops in this patch. It's extremely difficult to review > the common x86 code that uses the ops without seeing how they're implemented in > SVM. I believe the overall size/scope of the patch can be kept reasonable by > introducing some of the common changes in advance of the new ops, e.g. tweaking > the KVM_SET_VCPU_EVENTS flow. > >> Note that the vNMI can't cause a VM exit, which is needed >> when a nested guest intercepts NMIs. > > I can't tell if this is saying "SVM doesn't allow intercepting virtual NMIs", or > "KVM never enables virtual NMI interception". > I think, It meant to say that vNMI doesn't need nmi_window_exiting feature to pend the new virtual NMI. Will reword. >> Therefore to avoid breaking nesting, the vNMI is inhibited while >> a nested guest is running and instead, the legacy NMI window >> detection and delivery method is used. > > State what KVM does, don't describe the effects. E.g. "Disable vNMI while running > L2". When a changelog describes the effects, it's unclear whether the effects are > new behavior introduced by the patch, hardware behavior, etc... > >> While it is possible to passthrough the vNMI if a nested guest >> doesn't intercept NMIs, such usage is very uncommon, and it's >> not worth to optimize for. > > Can you elaborate on why not? It's not obvious to me that the code will end up > more complex, and oftentimes omitting code increases the cognitive load on readers, > i.e. makes things more complex in a way. vNMI is mutually exclusive with NMI > passthrough, i.e. KVM doesn't need to handle NMI passthrough and vNMI simultaneously. > >> Signed-off-by: Santosh Shukla <santosh.shukla@xxxxxxx> >> Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > > SoB chain is wrong. Maxim is credited as the sole Author, i.e. Santosh shouldn't > have a SoB. Assuming the intent is to attribute both of ya'll this needs to be > > Co-developed-by: Santosh Shukla <santosh.shukla@xxxxxxx> > Signed-off-by: Santosh Shukla <santosh.shukla@xxxxxxx> > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > > if Maxim is the primary author, or this if Santosh is the primary author > > From: Santosh Shukla <santosh.shukla@xxxxxxx> > > <blah blah blah> > > Signed-off-by: Santosh Shukla <santosh.shukla@xxxxxxx> > Developed-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > Signed-off-by: Maxim Levitsky <mlevitsk@xxxxxxxxxx> > Will sort those in v3. >> --- >> arch/x86/kvm/svm/nested.c | 42 +++++++++++++++ >> arch/x86/kvm/svm/svm.c | 111 ++++++++++++++++++++++++++++++-------- >> arch/x86/kvm/svm/svm.h | 10 ++++ >> 3 files changed, 140 insertions(+), 23 deletions(-) >> >> diff --git a/arch/x86/kvm/svm/nested.c b/arch/x86/kvm/svm/nested.c >> index e891318595113e..5bea672bf8b12d 100644 >> --- a/arch/x86/kvm/svm/nested.c >> +++ b/arch/x86/kvm/svm/nested.c >> @@ -623,6 +623,42 @@ static bool is_evtinj_nmi(u32 evtinj) >> return type == SVM_EVTINJ_TYPE_NMI; >> } >> >> +static void nested_svm_save_vnmi(struct vcpu_svm *svm) > > Please avoid save/restore names. KVM (selftests in particular) uses save/restore > to refer to a saving and restoring state across a migration. "sync" is probably > the best option, or just open code the flows. > ok. I chose to open code that way I don't need to consider using svm->nmi_masked which should be used for non-vNMI case. >> +{ >> + struct vmcb *vmcb01 = svm->vmcb01.ptr; >> + >> + /* >> + * Copy the vNMI state back to software NMI tracking state >> + * for the duration of the nested run >> + */ >> + >> + svm->nmi_masked = vmcb01->control.int_ctl & V_NMI_MASK; >> + svm->vcpu.arch.nmi_pending += vmcb01->control.int_ctl & V_NMI_PENDING; > > This is wrong. V_NMI_PENDING is bit 11, i.e. the bitwise-AND does not yield a > boolean value and will increment nmi_pending by 2048 instead of by 1. > Right. > if (vmcb01->control.int_ctl & V_NMI_PENDING) > svm->vcpu.arch.nmi_pending++; > > And this needs a KVM_REQ_EVENT to ensure KVM processes the newly pending NMI. > Ok. >> +} >> + >> +static void nested_svm_restore_vnmi(struct vcpu_svm *svm) >> +{ >> + struct kvm_vcpu *vcpu = &svm->vcpu; >> + struct vmcb *vmcb01 = svm->vmcb01.ptr; >> + >> + /* >> + * Restore the vNMI state from the software NMI tracking state >> + * after a nested run >> + */ >> + >> + if (svm->nmi_masked) >> + vmcb01->control.int_ctl |= V_NMI_MASK; >> + else >> + vmcb01->control.int_ctl &= ~V_NMI_MASK; > > As proposed, this needs to clear nmi_masked to avoid false positives. The better > approach is to not have any open coded accesses to svm->nmi_masked outside of > flows that specifically need to deal with vNMI logic. > ok. > E.g. svm_enable_nmi_window() reads the raw nmi_masked. > >> + >> + if (vcpu->arch.nmi_pending) { >> + vcpu->arch.nmi_pending--; >> + vmcb01->control.int_ctl |= V_NMI_PENDING; >> + } else > > Curly braces on all paths if any path needs 'em. > ok. >> + vmcb01->control.int_ctl &= ~V_NMI_PENDING; >> +} > > ... > >> + static bool svm_set_hw_nmi_pending(struct kvm_vcpu *vcpu) >> +{ >> + struct vcpu_svm *svm = to_svm(vcpu); >> + >> + if (!is_vnmi_enabled(svm)) >> + return false; >> + >> + if (svm->vmcb->control.int_ctl & V_NMI_PENDING) >> + return false; >> + >> + svm->vmcb->control.int_ctl |= V_NMI_PENDING; >> + vmcb_mark_dirty(svm->vmcb, VMCB_INTR); >> + >> + /* >> + * NMI isn't yet technically injected but >> + * this rough estimation should be good enough > > Again, wrap at 80 chars, not at random points. > ok. >> + */ >> + ++vcpu->stat.nmi_injections; >> + >> + return true; >> +} >> + > > ... > >> bool svm_interrupt_blocked(struct kvm_vcpu *vcpu) >> { >> struct vcpu_svm *svm = to_svm(vcpu); >> @@ -3725,10 +3772,16 @@ static void svm_enable_nmi_window(struct kvm_vcpu *vcpu) >> /* >> * Something prevents NMI from been injected. Single step over possible >> * problem (IRET or exception injection or interrupt shadow) >> + * >> + * With vNMI we should never need an NMI window >> + * (we can always inject vNMI either by setting VNMI_PENDING or by EVENTINJ) > > Please honor the soft limit and avoid pronouns. There's also no need to put the > blurb in parantheses on its own line. > > As for the code, I believe there are bugs. Pulling in the context... > > if (svm->nmi_masked && !svm->awaiting_iret_completion) > return; /* IRET will cause a vm exit */ > > Checking nmi_masked is wrong, this should use the helper. Even if this code can Right,. > only be reached on error, it should still try its best to not make things worse. > > if (!gif_set(svm)) { > if (vgif) > svm_set_intercept(svm, INTERCEPT_STGI); > return; /* STGI will cause a vm exit */ > } > > /* > * Something prevents NMI from been injected. Single step over possible > * problem (IRET or exception injection or interrupt shadow) > * > * With vNMI we should never need an NMI window > * (we can always inject vNMI either by setting VNMI_PENDING or by EVENTINJ) > */ > if (WARN_ON_ONCE(is_vnmi_enabled(svm))) > return; > > This is flawed, where "this" means handling of NMI windows when vNMI is enabled. > > IIUC, if there are back-to-back NMIs, the intent is to set V_NMI for one and > inject the other. I believe there are multiple bugs svm_inject_nmi(). The one > that's definitely a bug is setting svm->nmi_masked. The other suspected bug, > which is related to the above WARN, is setting the IRET intercept. The resulting > IRET interception will set awaiting_iret_completion, and then the above WARN is > reachable (even if the masking check is fixed). > > I don't think KVM needs to ever intercept IRET. One NMI gets injected, and the > other is sitting in INT_CTL.V_NMI_PENDING, i.e. there's no need for KVM to regain > control. If another NMI comes along before V_NMI_PENDING is handled, it will > either get injected or dropped. > > So I _think_ KVM can skip the intercept code when injecting an NMI, and this WARN > can be hoisted to the top of svm_enable_nmi_window(), because as stated above, KVM > should never need to request an NMI window. > > Last thought, unless there's something that will obviously break, it's probably > better to WARN and continue than to bail. I.e. do the single-step and hope for > the best. Bailing at this point doesn't seem like it would help. > >> */ >> + if (WARN_ON_ONCE(is_vnmi_enabled(svm))) >> + return; >> + >> svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu); >> - svm->nmi_singlestep = true; >> svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF); >> + svm->nmi_singlestep = true; >> } > Ok. So you mean.. In vNMI mode, KVM should never need to request NMI window and eventually it reaches to NMI window then WARN_ON and cont.. to single step... so modified code change may look something like below: static void svm_enable_nmi_window(struct kvm_vcpu *vcpu) { struct vcpu_svm *svm = to_svm(vcpu); /* * With vNMI we should never need an NMI window. * and if we reach here then better WARN and continue to single step. */ WARN_ON_ONCE(is_vnmi_enabled(svm)); if (svm_get_nmi_mask(vcpu) && !svm->awaiting_iret_completion) return; /* IRET will cause a vm exit */ if (!gif_set(svm)) { if (vgif) svm_set_intercept(svm, INTERCEPT_STGI); return; /* STGI will cause a vm exit */ } /* * Something prevents NMI from been injected. Single step over possible * problem (IRET or exception injection or interrupt shadow) */ svm->nmi_singlestep_guest_rflags = svm_get_rflags(vcpu); svm->vmcb->save.rflags |= (X86_EFLAGS_TF | X86_EFLAGS_RF); svm->nmi_singlestep = true; } Does that make sense? Thanks, Santosh > ... > >> @@ -553,6 +554,15 @@ static inline bool is_x2apic_msrpm_offset(u32 offset) >> (msr < (APIC_BASE_MSR + 0x100)); >> } >> >> +static inline bool is_vnmi_enabled(struct vcpu_svm *svm) >> +{ >> + /* L1's vNMI is inhibited while nested guest is running */ >> + if (is_guest_mode(&svm->vcpu)) > > I would rather check the current VMCB. I don't see any value in hardcoding the > "KVM doesn't support vNMI in L2" in multiple places. And I find the above comment > about "L1's vNMI is inhibited" confusing. vNMI isn't inhibited/blocked, KVM just > doesn't utilize vNMI while L2 is active (IIUC, as proposed). > >> + return false; >> + >> + return !!(svm->vmcb01.ptr->control.int_ctl & V_NMI_ENABLE); >> +} >> + >> /* svm.c */ >> #define MSR_INVALID 0xffffffffU >> >> -- >> 2.26.3 >>