On 26/04/2017 08:34, Ladi Prosek wrote: > On Tue, Apr 25, 2017 at 6:38 PM, David Hildenbrand <david@xxxxxxxxxx> wrote: >> On 25.04.2017 16:42, Ladi Prosek wrote: >>> On AMD, the effect of set_nmi_mask called by emulate_iret_real and em_rsm >>> on hflags is reverted later on in x86_emulate_instruction where hflags are >>> overwritten with ctxt->emul_flags (the kvm_set_hflags call). This manifests >>> as a hang when rebooting Windows VMs with QEMU, OVMF, and >1 vcpu. >>> >>> Instead of trying to merge ctxt->emul_flags into vcpu->arch.hflags after >>> an instruction is emulated, this commit deletes emul_flags altogether and >>> makes the emulator access vcpu->arch.hflags using two new accessors. This >>> way all changes, on the emulator side as well as in functions called from >>> the emulator and accessing vcpu state with emul_to_vcpu, are preserved. >>> >>> More details on the bug and its manifestation with Windows and OVMF: >>> >>> It's a KVM bug in the interaction between SMI/SMM and NMI, specific to AMD. >>> I believe that the SMM part explains why we started seeing this only with >>> OVMF. >>> >>> KVM masks and unmasks NMI when entering and leaving SMM. When KVM emulates >>> the RSM instruction in em_rsm, the set_nmi_mask call doesn't stick because >>> later on in x86_emulate_instruction we overwrite arch.hflags with >>> ctxt->emul_flags, effectively reverting the effect of the set_nmi_mask call. >>> The AMD-specific hflag of interest here is HF_NMI_MASK. >>> >>> When rebooting the system, Windows sends an NMI IPI to all but the current >>> cpu to shut them down. Only after all of them are parked in HLT will the >>> initiating cpu finish the restart. If NMI is masked, other cpus never get >>> the memo and the initiating cpu spins forever, waiting for >>> hal!HalpInterruptProcessorsStarted to drop. That's the symptom we observe. >>> >>> Fixes: a584539b24b8 ("KVM: x86: pass the whole hflags field to emulator and back") >>> Signed-off-by: Ladi Prosek <lprosek@xxxxxxxxxx> >> [...] >>> >>> - if ((ctxt->emul_flags & X86EMUL_SMM_INSIDE_NMI_MASK) == 0) >>> + if ((ctxt->ops->get_hflags(ctxt) & X86EMUL_SMM_INSIDE_NMI_MASK) == 0) >>> ctxt->ops->set_nmi_mask(ctxt, false); >>> >>> - ctxt->emul_flags &= ~X86EMUL_SMM_INSIDE_NMI_MASK; >>> - ctxt->emul_flags &= ~X86EMUL_SMM_MASK; >>> + ctxt->ops->set_hflags(ctxt, ctxt->ops->get_hflags(ctxt) & >>> + ~X86EMUL_SMM_INSIDE_NMI_MASK & >>> + ~X86EMUL_SMM_MASK); >> >> Wonder if that would look better with >> & ~(X86EMUL_SMM_INSIDE_NMI_MASK | X86EMUL_SMM_MASK) > > Yes, it definitely would. > >> [...] >>> >>> static void toggle_interruptibility(struct kvm_vcpu *vcpu, u32 mask) >>> @@ -5314,7 +5326,6 @@ static void init_emulate_ctxt(struct kvm_vcpu *vcpu) >>> BUILD_BUG_ON(HF_GUEST_MASK != X86EMUL_GUEST_MASK); >>> BUILD_BUG_ON(HF_SMM_MASK != X86EMUL_SMM_MASK); >>> BUILD_BUG_ON(HF_SMM_INSIDE_NMI_MASK != X86EMUL_SMM_INSIDE_NMI_MASK); >>> - ctxt->emul_flags = vcpu->arch.hflags; >>> >>> init_decode_cache(ctxt); >>> vcpu->arch.emulate_regs_need_sync_from_vcpu = false; >>> @@ -5718,8 +5729,6 @@ int x86_emulate_instruction(struct kvm_vcpu *vcpu, >>> unsigned long rflags = kvm_x86_ops->get_rflags(vcpu); >>> toggle_interruptibility(vcpu, ctxt->interruptibility); >>> vcpu->arch.emulate_regs_need_sync_to_vcpu = false; >>> - if (vcpu->arch.hflags != ctxt->emul_flags) >>> - kvm_set_hflags(vcpu, ctxt->emul_flags); >> >> I like to see that go. >> >>> kvm_rip_write(vcpu, ctxt->eip); >>> if (r == EMULATE_DONE) >>> kvm_vcpu_check_singlestep(vcpu, rflags, &r); >>> >> >> Looks very good to me! > > Thank you for the review! Fixed and queued. Paolo