Sean Christopherson <seanjc@xxxxxxxxxx> writes: > Use the recently introduced KVM_REQ_TRIPLE_FAULT to properly emulate > shutdown if RSM from SMM fails. > > Note, entering shutdown after clearing the SMM flag and restoring NMI > blocking is architecturally correct with respect to AMD's APM, which KVM > also uses for SMRAM layout and RSM NMI blocking behavior. The APM says: > > An RSM causes a processor shutdown if an invalid-state condition is > found in the SMRAM state-save area. Only an external reset, external > processor-initialization, or non-maskable external interrupt (NMI) can > cause the processor to leave the shutdown state. > > Of note is processor-initialization (INIT) as a valid shutdown wake > event, as INIT is blocked by SMM, implying that entering shutdown also > forces the CPU out of SMM. > > For recent Intel CPUs, restoring NMI blocking is technically wrong, but > so is restoring NMI blocking in the first place, and Intel's RSM > "architecture" is such a mess that just about anything is allowed and can > be justified as micro-architectural behavior. > > Per the SDM: > > On Pentium 4 and later processors, shutdown will inhibit INTR and A20M > but will not change any of the other inhibits. On these processors, > NMIs will be inhibited if no action is taken in the SMI handler to > uninhibit them (see Section 34.8). > > where Section 34.8 says: > > When the processor enters SMM while executing an NMI handler, the > processor saves the SMRAM state save map but does not save the > attribute to keep NMI interrupts disabled. Potentially, an NMI could be > latched (while in SMM or upon exit) and serviced upon exit of SMM even > though the previous NMI handler has still not completed. > > I.e. RSM unconditionally unblocks NMI, but shutdown on RSM does not, > which is in direct contradiction of KVM's behavior. But, as mentioned > above, KVM follows AMD architecture and restores NMI blocking on RSM, so > that micro-architectural detail is already lost. > > And for Pentium era CPUs, SMI# can break shutdown, meaning that at least > some Intel CPUs fully leave SMM when entering shutdown: > > In the shutdown state, Intel processors stop executing instructions > until a RESET#, INIT# or NMI# is asserted. While Pentium family > processors recognize the SMI# signal in shutdown state, P6 family and > Intel486 processors do not. > > In other words, the fact that Intel CPUs have implemented the two > extremes gives KVM carte blanche when it comes to honoring Intel's > architecture for handling shutdown during RSM. > > Signed-off-by: Sean Christopherson <seanjc@xxxxxxxxxx> > --- > arch/x86/kvm/emulate.c | 12 +++++++----- > arch/x86/kvm/kvm_emulate.h | 1 + > arch/x86/kvm/x86.c | 6 ++++++ > 3 files changed, 14 insertions(+), 5 deletions(-) > > diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c > index 5e5de05a8fbf..0603a2c79093 100644 > --- a/arch/x86/kvm/emulate.c > +++ b/arch/x86/kvm/emulate.c > @@ -2683,7 +2683,7 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt) > * state-save area. > */ > if (ctxt->ops->pre_leave_smm(ctxt, buf)) > - return X86EMUL_UNHANDLEABLE; > + goto emulate_shutdown; > > #ifdef CONFIG_X86_64 > if (emulator_has_longmode(ctxt)) > @@ -2692,14 +2692,16 @@ static int em_rsm(struct x86_emulate_ctxt *ctxt) > #endif > ret = rsm_load_state_32(ctxt, buf); > > - if (ret != X86EMUL_CONTINUE) { > - /* FIXME: should triple fault */ > - return X86EMUL_UNHANDLEABLE; > - } > + if (ret != X86EMUL_CONTINUE) > + goto emulate_shutdown; > > ctxt->ops->post_leave_smm(ctxt); > > return X86EMUL_CONTINUE; > + > +emulate_shutdown: > + ctxt->ops->triple_fault(ctxt); > + return X86EMUL_UNHANDLEABLE; I'm probably missing something, but what's the desired effect of both raising KVM_REQ_TRIPLE_FAULT and returning X86EMUL_UNHANDLEABLE here? I've modified smm selftest to see what's happening: diff --git a/tools/testing/selftests/kvm/x86_64/smm_test.c b/tools/testing/selftests/kvm/x86_64/smm_test.c index 613c42c5a9b8..cf215cd2c6e2 100644 --- a/tools/testing/selftests/kvm/x86_64/smm_test.c +++ b/tools/testing/selftests/kvm/x86_64/smm_test.c @@ -147,6 +147,11 @@ int main(int argc, char *argv[]) "Unexpected stage: #%x, got %x", stage, stage_reported); + if (stage_reported == SMRAM_STAGE) { + /* corrupt smram */ + memset(addr_gpa2hva(vm, SMRAM_GPA) + 0xfe00, 0xff, 512); + } + state = vcpu_save_state(vm, VCPU_ID); kvm_vm_release(vm); kvm_vm_restart(vm, O_RDWR); What I see is: smm_test-7600 [002] 4497.073918: kvm_exit: reason EXIT_RSM rip 0x8004 info 0 0 smm_test-7600 [002] 4497.073921: kvm_emulate_insn: 1000000:8004: 0f aa smm_test-7600 [002] 4497.073924: kvm_smm_transition: vcpu 1: leaving SMM, smbase 0x1000000 smm_test-7600 [002] 4497.073928: kvm_emulate_insn: 0:8004: 0f aa FAIL smm_test-7600 [002] 4497.073929: kvm_fpu: unload smm_test-7600 [002] 4497.073930: kvm_userspace_exit: reason KVM_EXIT_INTERNAL_ERROR (17) If I change X86EMUL_UNHANDLEABLE to X86EMUL_CONTINUE tripple fault is happening indeed (why don't we have triple fault printed in trace by default BTW???): smm_test-16810 [006] 5117.007220: kvm_exit: reason EXIT_RSM rip 0x8004 info 0 0 smm_test-16810 [006] 5117.007222: kvm_emulate_insn: 1000000:8004: 0f aa smm_test-16810 [006] 5117.007225: kvm_smm_transition: vcpu 1: leaving SMM, smbase 0x1000000 smm_test-16810 [006] 5117.007229: bputs: vcpu_enter_guest: KVM_REQ_TRIPLE_FAULT smm_test-16810 [006] 5117.007230: kvm_fpu: unload smm_test-16810 [006] 5117.007230: kvm_userspace_exit: reason KVM_EXIT_SHUTDOWN (8) So should we actually have X86EMUL_CONTINUE when we queue KVM_REQ_TRIPLE_FAULT here? (Initially, my comment was supposed to be 'why don't you add TRIPLE_FAULT to smm selftest?' but the above overshadows it) > } > > static void > diff --git a/arch/x86/kvm/kvm_emulate.h b/arch/x86/kvm/kvm_emulate.h > index 3e870bf9ca4d..9c34aa60e45f 100644 > --- a/arch/x86/kvm/kvm_emulate.h > +++ b/arch/x86/kvm/kvm_emulate.h > @@ -233,6 +233,7 @@ struct x86_emulate_ops { > int (*pre_leave_smm)(struct x86_emulate_ctxt *ctxt, > const char *smstate); > void (*post_leave_smm)(struct x86_emulate_ctxt *ctxt); > + void (*triple_fault)(struct x86_emulate_ctxt *ctxt); > int (*set_xcr)(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr); > }; > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 54d212fe9b15..cda148cf06fa 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -7123,6 +7123,11 @@ static void emulator_post_leave_smm(struct x86_emulate_ctxt *ctxt) > kvm_smm_changed(emul_to_vcpu(ctxt)); > } > > +static void emulator_triple_fault(struct x86_emulate_ctxt *ctxt) > +{ > + kvm_make_request(KVM_REQ_TRIPLE_FAULT, emul_to_vcpu(ctxt)); > +} > + > static int emulator_set_xcr(struct x86_emulate_ctxt *ctxt, u32 index, u64 xcr) > { > return __kvm_set_xcr(emul_to_vcpu(ctxt), index, xcr); > @@ -7172,6 +7177,7 @@ static const struct x86_emulate_ops emulate_ops = { > .set_hflags = emulator_set_hflags, > .pre_leave_smm = emulator_pre_leave_smm, > .post_leave_smm = emulator_post_leave_smm, > + .triple_fault = emulator_triple_fault, > .set_xcr = emulator_set_xcr, > }; -- Vitaly