On Wed, Jul 05, 2023, Chao Gao wrote: > On Fri, Jun 30, 2023 at 03:56:14PM -0700, Sean Christopherson wrote: > >mn Fri, Jun 30, 2023, Chao Gao wrote: > >> am wondering if we can link vcpu->arch.mp_state to VMCS activity state, > > > >Hrm, maybe. > > > >> i.e., when mp_state is set to RUNNABLE in KVM_SET_MP_STATE ioctl, KVM > >> sets VMCS activity state to active. > > > >Not in the ioctl(), there needs to be a proper set of APIs, e.g. so that the > >existing hack works, > > I don't get why the existing hack will be broken if we piggyback on the > KVM_GET/SET_MP_STATE ioctl(). The hack is for "Older userspace" back to > 2008. I suppose the "Older userspace" doesn't support disabling hlt exit. True, it does seem extremely unlikely that the two would collide. And even if there is some bizarre userspace that relies on the old hack *and* exposes HLT, fixing this in set_mpstate() wouldn't make the problem worse, i.e. there's no reason to hold up the fix. Qi, for v2, can you handle this in kvm_arch_vcpu_ioctl_set_mpstate() instead of __set_regs()? And wire up the new hook to support a static_call(), e.g. I think the call site would be: diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 05a68d7d99fe..92d255fb00c5 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -11411,6 +11411,10 @@ int kvm_arch_vcpu_ioctl_set_mpstate(struct kvm_vcpu *vcpu, vcpu->arch.mp_state = mp_state->mp_state; kvm_make_request(KVM_REQ_EVENT, vcpu); + if (kvm_hlt_in_guest(vcpu->kvm) && + mp_state->mp_state != KVM_MP_STATE_HALTED) + static_call_cond(kvm_x86_clear_hlt)(vcpu); + ret = 0; out: vcpu_put(vcpu); > >and so that KVM actually reports out to userspace that a > >vCPU is HALTED if userspace gained control of the vCPU, e.g. after an IRQ exit, > >while the vCPU was HALTED. I.e. mp_state versus vmcs.ACTIVITY_STATE needs to be > >bidirectional, not one-way. E.g. if a vCPU is live migrated, I'm pretty sure > >vmcs.ACTIVITY_STATE is lost, which is wrong. > > Yes. Agreed. Fixing this is going to be painful. Reporting KVM_MP_STATE_HALTED would cause problems if a VM is migrated from a KVM with the fix to a KVM without the fix, as the "bad" KVM wouldn't know it should treat the guest as running and actually put the vCPU into HLT. We can't use a new KVM_MP_STATE because that would fully break migration (*sigh*). That means we probably need to add a flag somewhere, on top of also teaching KVM to treat the guest as runnable. So as above, I think it makes sense to get the immediate bug fixed and worry about migrating HLT state in the future, especially since KVM *can't* migrate HLT state on SVM. > >I'm half tempted to solve this particular issue by stuffing vmcs.ACTIVITY_STATE on > >shutdown, similar to what SVM does on shutdown interception. KVM doesn't come > >anywhere near faithfully emulating shutdown, so it's unlikely to break anything. > >And then the mp_state vs. hlt_in_guest coulbe tackled separately. Ugh, but that > >wouldn't cover a synthesized KVM_REQ_TRIPLE_FAULT. > > I traced the process of guest reboot but I didn't see triple-fault > VMExit. So, I don't think this can fix the issue. Ah, right. No idea why I jumped to the conclusion that there would be a triple fault, the changelog even specifically calls out that the reboot happens after kdump runs. Even if there were a triple fault, that wouldn't cover all vCPUs.