On Sat, Jul 1, 2023 at 6:56 AM Sean Christopherson <seanjc@xxxxxxxxxx> wrote: > > mn Fri, Jun 30, 2023, Chao Gao wrote: > > On Fri, Jun 30, 2023 at 03:26:12PM +0800, Qi Ai wrote: > > >+ !is_protmode(vcpu)) > > >+ kvm_x86_ops.clear_hlt(vcpu); > > > > Use static_call_cond(kvm_x86_clear_hlt)(vcpu) instead. > > > > It looks incorrect that we add this side-effect heuristically here. I > > Yeah, adding heuristics to KVM_SET_REGS isn't happening. KVM's existing hack > for "Older userspace" in __set_sregs_common() is bad enough: > > /* Older userspace won't unhalt the vcpu on reset. */ > if (kvm_vcpu_is_bsp(vcpu) && kvm_rip_read(vcpu) == 0xfff0 && > sregs->cs.selector == 0xf000 && sregs->cs.base == 0xffff0000 && > !is_protmode(vcpu)) > vcpu->arch.mp_state = KVM_MP_STATE_RUNNABLE; > > > am wondering if we can link vcpu->arch.mp_state to VMCS activity state, > > Hrm, maybe. > > > i.e., when mp_state is set to RUNNABLE in KVM_SET_MP_STATE ioctl, KVM > > sets VMCS activity state to active. > > Not in the ioctl(), there needs to be a proper set of APIs, e.g. so that the > existing hack works, and so that KVM actually reports out to userspace that a > vCPU is HALTED if userspace gained control of the vCPU, e.g. after an IRQ exit, > while the vCPU was HALTED. I.e. mp_state versus vmcs.ACTIVITY_STATE needs to be > bidirectional, not one-way. E.g. if a vCPU is live migrated, I'm pretty sure > vmcs.ACTIVITY_STATE is lost, which is wrong. > > The downside is that if KVM propagates vmcs.ACTIVITY_STATE to mp_state for the > halted case, then KVM will enter kvm_vcpu_halt() instead of entering the guest > in halted state, which is undesirable. Hmm, that can be handled by treating > the vCPU as running, e.g. > > static inline bool kvm_vcpu_running(struct kvm_vcpu *vcpu) > { > return (vcpu->arch.mp_state == KVM_MP_STATE_RUNNABLE || > (vcpu->arch.mp_state == KVM_MP_STATE_HALTED && > kvm_hlt_in_guest(vcpu->kvm))) && > !vcpu->arch.apf.halted); > } > > but that would have cascading effect to a whole pile of things. I don't *think* > they'd be used with kvm_hlt_in_guest(), but we've had weirder stuff. > > I'm half tempted to solve this particular issue by stuffing vmcs.ACTIVITY_STATE on > shutdown, similar to what SVM does on shutdown interception. KVM doesn't come > anywhere near faithfully emulating shutdown, so it's unlikely to break anything. > And then the mp_state vs. hlt_in_guest coulbe tackled separately. Ugh, but that > wouldn't cover a synthesized KVM_REQ_TRIPLE_FAULT. > > diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c > index 44fb619803b8..ee4bb37067d1 100644 > --- a/arch/x86/kvm/vmx/vmx.c > +++ b/arch/x86/kvm/vmx/vmx.c > @@ -5312,6 +5312,8 @@ static __always_inline int handle_external_interrupt(struct kvm_vcpu *vcpu) > > static int handle_triple_fault(struct kvm_vcpu *vcpu) > { > + vmcs_write32(GUEST_ACTIVITY_STATE, GUEST_ACTIVITY_ACTIVE); > + > vcpu->run->exit_reason = KVM_EXIT_SHUTDOWN; > vcpu->mmio_needed = 0; > return 0; > > > I don't suppose QEMU can to blast INIT at all vCPUs for this case? Reproduce this problem need to use the cpu_pm=on in QEMU, so execute halt in vm doesn't cause a vm exit, so mp_state will never be HLT. I am confused why mp_state is considered in this case. And the bsp's vmcs.ACTIVITY_STATE need to reset to ACTIVITY to solve this problem. We need a proper set of APIs as you say. In this case, do we only provide a reset ioctl, or do we need to report vmcs.ACTIVITY_STATE to the userspace?