Re: [RFC PATCH 3/3] kvm: use yield_to instead of sleep in kvm_vcpu_on_spin

Chris Wright <chrisw@xxxxxxxxxxxx> · Thu, 2 Dec 2010 18:24:18 -0800

* Rik van Riel (riel@xxxxxxxxxx) wrote:
> --- a/virt/kvm/kvm_main.c
> +++ b/virt/kvm/kvm_main.c
> @@ -1880,18 +1880,53 @@ void kvm_resched(struct kvm_vcpu *vcpu)
>  }
>  EXPORT_SYMBOL_GPL(kvm_resched);
>  
> -void kvm_vcpu_on_spin(struct kvm_vcpu *vcpu)
> +void kvm_vcpu_on_spin(struct kvm_vcpu *me)
>  {
> -	ktime_t expires;
> -	DEFINE_WAIT(wait);
> +	struct kvm *kvm = me->kvm;
> +	struct kvm_vcpu *vcpu;
> +	int last_boosted_vcpu = me->kvm->last_boosted_vcpu;

s/me->//

> +	int first_round = 1;
> +	int i;
>  
> -	prepare_to_wait(&vcpu->wq, &wait, TASK_INTERRUPTIBLE);
> +	me->spinning = 1;
> +
> +	/*
> +	 * We boost the priority of a VCPU that is runnable but not
> +	 * currently running, because it got preempted by something
> +	 * else and called schedule in __vcpu_run.  Hopefully that
> +	 * VCPU is holding the lock that we need and will release it.
> +	 * We approximate round-robin by starting at the last boosted VCPU.
> +	 */
> + again:
> +	kvm_for_each_vcpu(i, vcpu, kvm) {
> +		struct task_struct *task = vcpu->task;
> +		if (first_round && i < last_boosted_vcpu) {
> +			i = last_boosted_vcpu;
> +			continue;
> +		} else if (!first_round && i > last_boosted_vcpu)
> +			break;
> +		if (vcpu == me)
> +			continue;
> +		if (vcpu->spinning)
> +			continue;
> +		if (!task)
> +			continue;
> +		if (waitqueue_active(&vcpu->wq))
> +			continue;
> +		if (task->flags & PF_VCPU)
> +			continue;

I wonder if you set vcpu->task in sched_out and then NULL it in sched_in
if you'd get what you want you could simplify the checks.  Basically
that would be only the preempted runnable vcpus.

> +		kvm->last_boosted_vcpu = i;
> +		yield_to(task);

Just trying to think of ways to be sure this doesn't become just
yield() (although I thnk PF_VCPU is enough to catch that).
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html