On Sat, May 08, 2021, Wanpeng Li wrote: > From: Wanpeng Li <wanpengli@xxxxxxxxxxx> > > In case of undercomitted scenarios, vCPU can get scheduling easily, > kvm_vcpu_yield_to adds extra overhead, we can observe a lot of race > between vcpu->ready is true and yield fails due to p->state is > TASK_RUNNING. Let's bail out is such scenarios by checking the length > of current cpu runqueue. > > Signed-off-by: Wanpeng Li <wanpengli@xxxxxxxxxxx> > --- > arch/x86/kvm/x86.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 5bd550e..c0244a6 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -8358,6 +8358,9 @@ static void kvm_sched_yield(struct kvm_vcpu *vcpu, unsigned long dest_id) > struct kvm_vcpu *target = NULL; > struct kvm_apic_map *map; > > + if (single_task_running()) > + goto no_yield; > + Hmm, could we push the result of kvm_sched_yield() down into the guest? Currently the guest bails after the first attempt, which is perfect for this scenario, but it seems like it would make sense to keep trying to yield if there are multiple preempted vCPUs and the "problem" was with the target. E.g. /* * Make sure other vCPUs get a chance to run if they need to. Yield at * most once, and stop trying to yield if the VMM says yielding isn't * going to happen. */ for_each_cpu(cpu, mask) { if (vcpu_is_preempted(cpu)) { r = kvm_hypercall1(KVM_HC_SCHED_YIELD, per_cpu(x86_cpu_to_apicid, cpu)); if (r != -EBUSY) break; } } Unrelated to this patch, but it's the first time I've really looked at the guest side of directed yield... Wouldn't it also make sense for the guest side to hook .send_call_func_single_ipi? > vcpu->stat.directed_yield_attempted++; Shouldn't directed_yield_attempted be incremented in this case? It doesn't seem fundamentally different than the case where the target was scheduled in between the guest's check and the host's processing of the yield request. In both instances, the guest did indeed attempt to yield. > rcu_read_lock(); > -- > 2.7.4 >