On Fri Sep 13, 2024 at 7:01 PM UTC, Sean Christopherson wrote: > On Sun, Jun 09, 2024, Nicolas Saenz Julienne wrote: > > Model inactive VTL vCPUs' behaviour with a new MP state. > > > > Inactive VTLs are in an artificial halt state. They enter into this > > state in response to invoking HvCallVtlCall, HvCallVtlReturn. > > User-space, which is VTL aware, can processes the hypercall, and set the > > vCPU in MP_STATE_HV_INACTIVE_VTL. When a vCPU is run in this state it'll > > block until a wakeup event is received. The rules of what constitutes an > > event are analogous to halt's except that VTL's ignore RFLAGS.IF. > > > > When a wakeup event is registered, KVM will exit to user-space with a > > KVM_SYSTEM_EVENT exit, and KVM_SYSTEM_EVENT_WAKEUP event type. > > User-space is responsible of deciding whether the event has precedence > > over the active VTL and will switch the vCPU to KVM_MP_STATE_RUNNABLE > > before resuming execution on it. > > > > Running a KVM_MP_STATE_HV_INACTIVE_VTL vCPU with pending events will > > return immediately to user-space. > > > > Note that by re-using the readily available halt infrastructure in > > KVM_RUN, MP_STATE_HV_INACTIVE_VTL correctly handles (or disables) > > virtualisation features like the VMX preemption timer or APICv before > > blocking. > > IIUC, this is a convoluted and roundabout way to let userspace check if a vCPU > has a wake event, correct? Even by the end of the series, KVM never sets > MP_STATE_HV_INACTIVE_VTL, i.e. the only use for this is to combine it as: > > KVM_SET_MP_STATE => KVM_RUN => KVM_SET_MP_STATE => KVM_RUN Correct. > The upside to this approach is that it requires minimal uAPI and very few KVM > changes, but that's about it AFAICT. On the other hand, making this so painfully > specific feels like a missed opportunity, and unnecessarily bleeds VTL details > into KVM. > > Bringing halt-polling into the picture (by going down kvm_vcpu_halt()) is also > rather bizarre since quite a bit of time has already elapsed since the vCPU first > did HvCallVtlCall/HvCallVtlReturn. But that doesn't really have anything to do > with MP_STATE_HV_INACTIVE_VTL, e.g. it'd be just as easy to go to kvm_vcpu_block(). > > Why not add an ioctl() to very explicitly block until a wake event is ready? > Or probably better, a generic "wait" ioctl() that takes the wait type as an > argument. > > Kinda like your idea of supporting .poll() on the vCPU FD[*], except it's very > specifically restricted to a single caller (takes vcpu->mutex). We could probably > actually implement it via .poll(), but I suspect that would be more confusing than > helpful. > > E.g. extract the guts of vcpu_block() to a separate helper, and then wire that > up to an ioctl(). > > As for the RFLAGS.IF quirk, maybe handle that via a kvm_run flag? That way, > userspace doesn't need to do a round-trip just to set a single bit. E.g. I think > we should be able to squeeze it into "struct kvm_hyperv_exit". It's things like the RFLAG.IF exemption that deterred me from building a generic interface. We might find out that the generic blocking logic doesn't match the expected VTL semantics and be stuck with a uAPI that isn't enough for VSM, nor useful for any other use-case. We can always introduce 'flags' I guess. Note that I'm just being cautious here, AFAICT the generic approach works, and I'm fine with going the "wait" ioctl. > Actually, speaking of kvm_hyperv_exit, is there a reason we can't simply wire up > HVCALL_VTL_CALL and/or HVCALL_VTL_RETURN to a dedicated complete_userspace_io() > callback that blocks if some flag is set? That would make it _much_ cleaner to > scope the RFLAGS.IF check to kvm_hyperv_exit, and would require little to no new > uAPI. So IIUC, the approach is to have complete_userspace_io() block after re-entering HVCALL_VTL_RETURN. Then, have it exit back onto user-space whenever an event is made available (maybe re-using KVM_SYSTEM_EVENT_WAKEUP?). That would work, but will need something extra to be compatible with migration/live-update. > > @@ -3797,6 +3798,10 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu) > > if (!gif_set(svm)) > > return true; > > > > + /* > > + * The Hyper-V TLFS states that RFLAGS.IF is ignored when deciding > > + * whether to block interrupts targeted at inactive VTLs. > > + */ > > if (is_guest_mode(vcpu)) { > > /* As long as interrupts are being delivered... */ > > if ((svm->nested.ctl.int_ctl & V_INTR_MASKING_MASK) > > @@ -3808,7 +3813,7 @@ bool svm_interrupt_blocked(struct kvm_vcpu *vcpu) > > if (nested_exit_on_intr(svm)) > > return false; > > } else { > > - if (!svm_get_if_flag(vcpu)) > > + if (!svm_get_if_flag(vcpu) && !kvm_hv_vcpu_is_idle_vtl(vcpu)) > > Speaking of RFLAGS.IF, I think it makes sense to add a common x86 helper to handle > the RFLAGS.IF vs. idle VTL logic. Naming will be annoying, but that's about it. > > E.g. kvm_is_irq_blocked_by_rflags_if() or so. Noted. Thanks, Nicolas