Re: [PATCH][v2] KVM: x86: directly call wbinvd for local cpu when emulate wbinvd

Nadav Amit <nadav.amit@xxxxxxxxx> · Thu, 21 Oct 2021 22:28:54 -0700

> On Oct 21, 2021, at 9:16 PM, Li,Rongqing <lirongqing@xxxxxxxxx> wrote:
> 
> Ping 
> 
> -Li
> 
>> -----邮件原件-----
>> 发件人: Li,Rongqing <lirongqing@xxxxxxxxx>
>> 发送时间: 2021年10月13日 17:43
>> 收件人: x86@xxxxxxxxxx; kvm@xxxxxxxxxxxxxxx; Li,Rongqing
>> <lirongqing@xxxxxxxxx>
>> 主题: [PATCH][v2] KVM: x86: directly call wbinvd for local cpu when emulate
>> wbinvd
>> 
>> directly call wbinvd for local cpu, instead of calling atomic cpumask_set_cpu to
>> set local cpu, and then check if local cpu needs to run in on_each_cpu_mask
>> 
>> on_each_cpu_mask is less efficient than smp_call_function_many, since it will
>> close preempt again and running call function by checking flag with
>> SCF_RUN_LOCAL. and here wbinvd can be called directly
>> 
>> In fact, This change reverts commit 2eec73437487 ("KVM: x86: Avoid issuing
>> wbinvd twice"), since smp_call_function_many is skiping the local cpu (as
>> description of c2162e13d6e2f), wbinvd is not issued twice
>> 
>> and reverts commit c2162e13d6e2f ("KVM: X86: Fix missing local pCPU when
>> executing wbinvd on all dirty pCPUs") too, which fixed the previous patch, when
>> revert previous patch, it is not needed.
>> 
>> Signed-off-by: Li RongQing <lirongqing@xxxxxxxxx>
>> ---
>> diff v2: rewrite commit log
>> 
>> arch/x86/kvm/x86.c |   13 ++++++-------
>> 1 files changed, 6 insertions(+), 7 deletions(-)
>> 
>> diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index aabd3a2..28c4c72
>> 100644
>> --- a/arch/x86/kvm/x86.c
>> +++ b/arch/x86/kvm/x86.c
>> @@ -6991,15 +6991,14 @@ static int kvm_emulate_wbinvd_noskip(struct
>> kvm_vcpu *vcpu)
>> 		return X86EMUL_CONTINUE;
>> 
>> 	if (static_call(kvm_x86_has_wbinvd_exit)()) {
>> -		int cpu = get_cpu();
>> -
>> -		cpumask_set_cpu(cpu, vcpu->arch.wbinvd_dirty_mask);
>> -		on_each_cpu_mask(vcpu->arch.wbinvd_dirty_mask,
>> +		preempt_disable();
>> +		smp_call_function_many(vcpu->arch.wbinvd_dirty_mask,
>> 				wbinvd_ipi, NULL, 1);
>> -		put_cpu();
>> +		preempt_enable();
>> 		cpumask_clear(vcpu->arch.wbinvd_dirty_mask);
>> -	} else
>> -		wbinvd();
>> +	}
>> +
>> +	wbinvd();
>> 	return X86EMUL_CONTINUE;
>> }

KVM is none of my business, but on_each_cpu_mask() should be more
efficient since it would run wbinvd() concurrently locally and
remotely (this is a relatively recent change I made). wbinvd() is
an expensive operation, and preempt_enable() is cheap, so there
should not be complicated tradeoff here. 

The proposed change prevents running wbinvd() concurrently so
theoretically it should cause a 2x slowdown (for this specific
piece of code).