On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote: > Model inactive VTL vCPUs' behaviour with a new MP state. > > Inactive VTLs are in an artificial halt state. They enter into this > state in response to invoking HvCallVtlCall, HvCallVtlReturn. > User-space, which is VTL aware, can processes the hypercall, and set the > vCPU in MP_STATE_HV_INACTIVE_VTL. When a vCPU is run in this state it'll > block until a wakeup event is received. The rules of what constitutes an > event are analogous to halt's except that VTL's ignore RFLAGS.IF. > > When a wakeup event is registered, KVM will exit to user-space with a > KVM_SYSTEM_EVENT exit, and KVM_SYSTEM_EVENT_WAKEUP event type. > User-space is responsible of deciding whether the event has precedence > over the active VTL and will switch the vCPU to KVM_MP_STATE_RUNNABLE > before resuming execution on it. > > Running a KVM_MP_STATE_HV_INACTIVE_VTL vCPU with pending events will > return immediately to user-space. > > Note that by re-using the readily available halt infrastructure in > KVM_RUN, MP_STATE_HV_INACTIVE_VTL correctly handles (or disables) > virtualisation features like the VMX preemption timer or APICv before > blocking. IIUC, this is a convoluted and roundabout way to let userspace check if a vCPU has a wake event, correct? Even by the end of the series, KVM never sets MP_STATE_HV_INACTIVE_VTL, i.e. the only use for this is to combine it as: KVM_SET_MP_STATE => KVM_RUN => KVM_SET_MP_STATE => KVM_RUN The upside to this approach is that it requires minimal uAPI and very few KVM changes, but that's about it AFAICT. On the other hand, making this so painfully specific feels like a missed opportunity, and unnecessarily bleeds VTL details into KVM. Bringing halt-polling into the picture (by going down kvm_vcpu_halt()) is also rather bizarre since quite a bit of time has already elapsed since the vCPU first did HvCallVtlCall/HvCallVtlReturn. But that doesn't really have anything to do with MP_STATE_HV_INACTIVE_VTL, e.g. it'd be just as easy to go to kvm_vcpu_block(). Why not add an ioctl() to very explicitly block until a wake event is ready? Or probably better, a generic "wait" ioctl() that takes the wait type as an argument. Kinda like your idea of supporting .poll() on the vCPU FD[*], except it's very specifically restricted to a single caller (takes vcpu->mutex). We could probably actually implement it via .poll(), but I suspect that would be more confusing than helpful. E.g. extract the guts of vcpu_block() to a separate helper, and then wire that up to an ioctl(). As for the RFLAGS.IF quirk, maybe handle that via a kvm_run flag? That way, userspace doesn't need to do a round-trip just to set a single bit. E.g. I think we should be able to squeeze it into "struct kvm_hyperv_exit". Actually, speaking of kvm_hyperv_exit, is there a reason we can't simply wire up HVCALL_VTL_CALL and/or HVCALL_VTL_RETURN to a dedicated complete_userspace_io() callback that blocks if some flag is set? That would make it _much_ cleaner to scope the RFLAGS.IF check to kvm_hyperv_exit, and would require little to no new uAPI. > @@ -3797,6 +3798,10 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu) > if (!gif_set(svm)) > return true; > > + /* > + * The Hyper-V TLFS states that RFLAGS.IF is ignored when deciding > + * whether to block interrupts targeted at inactive VTLs. > + */ > if (is_guest_mode(vcpu)) { > /* As long as interrupts are being delivered... */ > if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) > @@ -3808,7 +3813,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu) > if (nested_exit_on_intr(svm)) > return false; > } else { > - if (!svm_get_if_flag(vcpu)) > + if (!svm_get_if_flag(vcpu) && !kvm_hv_vcpu_is_idle_vtl(vcpu)) Speaking of RFLAGS.IF, I think it makes sense to add a common x86 helper to handle the RFLAGS.IF vs. idle VTL logic. Naming will be annoying, but that's about it. E.g. kvm_is_irq_blocked_by_rflags_if() or so. [*] https://lore.kernel.org/lkml/20231001111313.77586-1-nsaenz@xxxxxxxxxx