RE: [PATCH] KVM: x86: optimize check for valid PAT value

David Laight <David.Laight@xxxxxxxxxx> · Mon, 15 Apr 2019 09:03:05 +0000

From: Paolo Bonzini
> Sent: 15 April 2019 09:12
> On 11/04/19 11:06, David Laight wrote:
> > It may be possible to generate shorter code that executes just as
> > fast by generating a single constant and deriving the others from it.
> > - generate 4s - needed first
> > - shift right 2 to get 1s (in parallel with the xor)
> > - use lea to get 6s (in parallel with an lea to do the add)
> > - invert the 1s to get FEs (also in parallel with the add)
> > - xor the FEs with the 6s to get F8s (in parallel with the or)
> > - and/test for the result

That version needs an extra register move I hadn't allowed for.
It is also impossible to stop gcc folding constant expressions
without an asm nop on a register.

> FWIW, here is yet another way to do it:
> 
> /* Change 6/7 to 4/5 */
> data &= ~((data & 0x0404040404040404ULL) >> 1);
> /* Only allow 0/1/4/5 now */
> return !(data & 0xFAFAFAFAFAFAFAFAULL);
> 
> movabs $0x404040404040404, %rcx
> andq   %rdx, %rcx
> shrq   %rcx
> notq   %rcx
> movabs $0xFAFAFAFAFAFAFA, %rax
> andq   %rcx, %rdx
> test   %rax, %rdx

Fewer opcode bytes, but 5 dependant instructions
(assuming the first constant can executed in parallel
with an earlier instruction).
I think my one was only 4 dependant instructions.

All these are far faster than the loop...

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)