On Wed, Aug 17, 2022, Paolo Bonzini wrote: > On 8/17/22 01:34, Sean Christopherson wrote: > > Isn't freeing up the return from kvm_vcpu_check_block() unnecessary? Can't we > > just do: > > > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > > index 9f11b505cbee..ccb9f8bdeb18 100644 > > --- a/arch/x86/kvm/x86.c > > +++ b/arch/x86/kvm/x86.c > > @@ -10633,7 +10633,7 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu) > > if (hv_timer) > > kvm_lapic_switch_to_hv_timer(vcpu); > > > > - if (!kvm_check_request(KVM_REQ_UNHALT, vcpu)) > > + if (!kvm_arch_vcpu_runnable(vcpu)) > > return 1; > > } > > > > > > which IMO is more intuitive and doesn't require reworking halt-polling (again). > > This was my first idea indeed. However I didn't like calling > kvm_arch_vcpu_runnable() again and "did it schedule()" seemed to be a less > interesting result from kvm_vcpu_block() (and in fact kvm_vcpu_halt() does > not bother passing it up the return chain). The flip side of calling kvm_arch_vcpu_runnable() again is that KVM will immediately wake the vCPU if it becomes runnable after kvm_vcpu_check_block(). The edge cases where the vCPU becomes runnable late are unlikely to truly matter in practice, but on the other hand manually re-checking kvm_arch_vcpu_runnable() means KVM gets both cases "right" (waited=true iff vCPU actually waited, vCPU awakened ASAP), whereas squishing the information into the return of kvm_vcpu_check_block() means KVM gets both cases "wrong" (waited=true even if schedule() was never called, vCPU left in a non-running state even though it's runnable). My only hesitation with calling kvm_arch_vcpu_runnable() again is that it could be problematic if KVM somehow managed to consume the event that caused kvm_vcpu_has_events() to return true, but I don't see how that could happen without it being a KVM bug. Ah, and if we do go with the explicit check, it should come with a comment to call out that KVM needs to return up the stack, e.g. /* * If KVM stopped blocking the vCPU but the vCPU is still not * runnable, then there is a pending host event of some kind, * e.g. a pending signal. Return up the stack to service the * pending event without changing the vCPU's activity state. */ if (!kvm_arch_vcpu_runnable(vcpu)) return 1; so that we don't end combining the checks into: while (!kvm_arch_vcpu_runnable(vcpu)) { /* * Switch to the software timer before halt-polling/blocking as * the guest's timer may be a break event for the vCPU, and the * hypervisor timer runs only when the CPU is in guest mode. * Switch before halt-polling so that KVM recognizes an expired * timer before blocking. */ hv_timer = kvm_lapic_hv_timer_in_use(vcpu); if (hv_timer) kvm_lapic_switch_to_sw_timer(vcpu); kvm_vcpu_srcu_read_unlock(vcpu); if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED) kvm_vcpu_halt(vcpu); else kvm_vcpu_block(vcpu); kvm_vcpu_srcu_read_lock(vcpu); if (hv_timer) kvm_lapic_switch_to_hv_timer(vcpu); } which looks sane, but would mean KVM never bounces out to handle whatever event needs handling. Side topic, usage of kvm_apic_accept_events() appears to be broken (though nothing can trigger the bug). If kvm_apic_accept_events() were to return an -errno, then kvm_arch_vcpu_ioctl_run() would return '0' to userspace without updating vcpu->run->exit_reason. I think an easy fix is to drop the return value entirely and then WARN if kvm_check_nested_events() returns something other than -EBUSY. if (is_guest_mode(vcpu)) { r = kvm_check_nested_events(vcpu); if (r < 0) { WARN_ON_ONCE(r != -EBUSY); return; }