Re: [PATCH v2 2/9] KVM: x86: remove return value of kvm_vcpu_block

Sean Christopherson <seanjc@xxxxxxxxxx> · Wed, 17 Aug 2022 16:41:01 +0000

On Wed, Aug 17, 2022, Paolo Bonzini wrote:
> On 8/17/22 01:34, Sean Christopherson wrote:
> > Isn't freeing up the return from kvm_vcpu_check_block() unnecessary?  Can't we
> > just do:
> > 
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index 9f11b505cbee..ccb9f8bdeb18 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -10633,7 +10633,7 @@ static inline int vcpu_block(struct kvm_vcpu *vcpu)
> >                  if (hv_timer)
> >                          kvm_lapic_switch_to_hv_timer(vcpu);
> > 
> > -               if (!kvm_check_request(KVM_REQ_UNHALT, vcpu))
> > +               if (!kvm_arch_vcpu_runnable(vcpu))
> >                          return 1;
> >          }
> > 
> > 
> > which IMO is more intuitive and doesn't require reworking halt-polling (again).
> 
> This was my first idea indeed.  However I didn't like calling
> kvm_arch_vcpu_runnable() again and "did it schedule()" seemed to be a less
> interesting result from kvm_vcpu_block() (and in fact kvm_vcpu_halt() does
> not bother passing it up the return chain).

The flip side of calling kvm_arch_vcpu_runnable() again is that KVM will immediately
wake the vCPU if it becomes runnable after kvm_vcpu_check_block().  The edge cases
where the vCPU becomes runnable late are unlikely to truly matter in practice, but
on the other hand manually re-checking kvm_arch_vcpu_runnable() means KVM gets both
cases "right" (waited=true iff vCPU actually waited, vCPU awakened ASAP), whereas
squishing the information into the return of kvm_vcpu_check_block() means KVM gets
both cases "wrong" (waited=true even if schedule() was never called, vCPU left in
a non-running state even though it's runnable).

My only hesitation with calling kvm_arch_vcpu_runnable() again is that it could be
problematic if KVM somehow managed to consume the event that caused kvm_vcpu_has_events()
to return true, but I don't see how that could happen without it being a KVM bug.

Ah, and if we do go with the explicit check, it should come with a comment to call
out that KVM needs to return up the stack, e.g.

		/*
		 * If KVM stopped blocking the vCPU but the vCPU is still not
		 * runnable, then there is a pending host event of some kind,
		 * e.g. a pending signal.  Return up the stack to service the
		 * pending event without changing the vCPU's activity state.
		 */
		if (!kvm_arch_vcpu_runnable(vcpu))
			return 1;

so that we don't end combining the checks into:

	while (!kvm_arch_vcpu_runnable(vcpu)) {
		/*
		 * Switch to the software timer before halt-polling/blocking as
		 * the guest's timer may be a break event for the vCPU, and the
		 * hypervisor timer runs only when the CPU is in guest mode.
		 * Switch before halt-polling so that KVM recognizes an expired
		 * timer before blocking.
		 */
		hv_timer = kvm_lapic_hv_timer_in_use(vcpu);
		if (hv_timer)
			kvm_lapic_switch_to_sw_timer(vcpu);

		kvm_vcpu_srcu_read_unlock(vcpu);
		if (vcpu->arch.mp_state == KVM_MP_STATE_HALTED)
			kvm_vcpu_halt(vcpu);
		else
			kvm_vcpu_block(vcpu);
		kvm_vcpu_srcu_read_lock(vcpu);

		if (hv_timer)
			kvm_lapic_switch_to_hv_timer(vcpu);
	}

which looks sane, but would mean KVM never bounces out to handle whatever event
needs handling.

Side topic, usage of kvm_apic_accept_events() appears to be broken (though nothing
can trigger the bug).  If kvm_apic_accept_events() were to return an -errno, then
kvm_arch_vcpu_ioctl_run() would return '0' to userspace without updating
vcpu->run->exit_reason.  I think an easy fix is to drop the return value entirely
and then WARN if kvm_check_nested_events() returns something other than -EBUSY.

	if (is_guest_mode(vcpu)) {
		r = kvm_check_nested_events(vcpu);
		if (r < 0) {
			WARN_ON_ONCE(r != -EBUSY);
			return;
		}