Re: [PATCH v3] KVM: x86: use wrpkru directly in kvm_load_{guest|host}_xsave_state

Dave Hansen <dave.hansen@xxxxxxxxx> · Wed, 12 May 2021 11:33:21 -0700

On 5/12/21 12:41 AM, Peter Zijlstra wrote:
> On Tue, May 11, 2021 at 01:05:02PM -0400, Jon Kohler wrote:
>> diff --git a/arch/x86/include/asm/fpu/internal.h b/arch/x86/include/asm/fpu/internal.h
>> index 8d33ad80704f..5bc4df3a4c27 100644
>> --- a/arch/x86/include/asm/fpu/internal.h
>> +++ b/arch/x86/include/asm/fpu/internal.h
>> @@ -583,7 +583,13 @@ static inline void switch_fpu_finish(struct fpu *new_fpu)
>>  		if (pk)
>>  			pkru_val = pk->pkru;
>>  	}
>> -	__write_pkru(pkru_val);
>> +
>> +	/*
>> +	 * WRPKRU is relatively expensive compared to RDPKRU.
>> +	 * Avoid WRPKRU when it would not change the value.
>> +	 */
>> +	if (pkru_val != rdpkru())
>> +		wrpkru(pkru_val);
> Just wondering; why aren't we having that in a per-cpu variable? The
> usual per-cpu MSR shadow approach avoids issuing any 'special' ops
> entirely.

It could be a per-cpu variable.  When I wrote this originally I figured
that a rdpkru would be cheaper than a load from memory (even per-cpu
memory).

But, now that I think about it, assuming that 'prku_val' is in %rdi, doing:

	cmp	%gs:0x1234, %rdi

might end up being cheaper than clobbering a *pair* of GPRs with rdpkru:

	xor    %ecx,%ecx
	rdpkru
	cmp	%rax, %rdi

I'm too lazy to go figure out what would be faster in practice, though.
 Does anyone care?