On 10/04/19 16:57, Sean Christopherson wrote: > On Wed, Apr 10, 2019 at 12:55:53PM +0000, David Laight wrote: >> From: Paolo Bonzini >>> Sent: 10 April 2019 10:55 >>> >>> This check will soon be done on every nested vmentry and vmexit, >>> "parallelize" it using bitwise operations. >>> >>> Signed-off-by: Paolo Bonzini <pbonzini@xxxxxxxxxx> >>> --- >> ... >>> diff --git a/arch/x86/kvm/x86.h b/arch/x86/kvm/x86.h >>> index 28406aa1136d..7bc7ac9d2a44 100644 >>> --- a/arch/x86/kvm/x86.h >>> +++ b/arch/x86/kvm/x86.h >>> @@ -347,4 +347,12 @@ static inline void kvm_after_interrupt(struct kvm_vcpu *vcpu) >>> __this_cpu_write(current_vcpu, NULL); >>> } >>> >>> +static inline bool kvm_pat_valid(u64 data) >>> +{ >>> + if (data & 0xF8F8F8F8F8F8F8F8) >>> + return false; >>> + /* 0, 1, 4, 5, 6, 7 are valid values. */ >>> + return (data | ((data & 0x0202020202020202) << 1)) == data; >>> +} >>> + >> >> How about: >> /* >> * Each byte must be 0, 1, 4, 5, 6 or 7. >> * Convert 001x to 011x then 100x so 2 and 3 fail the test. >> */ >> data |= (data ^ 0x0404040404040404ULL)) + 0x0202020202020202ULL; >> if (data & 0xF8F8F8F8F8F8F8F8ULL) >> return false; > > Woah. My vote is for Paolo's version as the separate checks allow the > reader to walk through step-by-step. The generated assembly isn't much > different from a performance perspective since the TEST+JNE will be not > taken in the fast path. > > Fancy: > 0x000000000004844f <+255>: movabs $0xf8f8f8f8f8f8f8f8,%rcx > 0x0000000000048459 <+265>: xor %eax,%eax > 0x000000000004845b <+267>: test %rcx,%rdx > 0x000000000004845e <+270>: jne 0x4848b <kvm_mtrr_valid+315> > 0x0000000000048460 <+272>: movabs $0x202020202020202,%rax > 0x000000000004846a <+282>: and %rdx,%rax > 0x000000000004846d <+285>: add %rax,%rax > 0x0000000000048470 <+288>: or %rdx,%rax > 0x0000000000048473 <+291>: cmp %rdx,%rax > 0x0000000000048476 <+294>: sete %al > 0x0000000000048479 <+297>: retq > > Really fancy: > 0x0000000000048447 <+247>: movabs $0x404040404040404,%rcx > 0x0000000000048451 <+257>: movabs $0x202020202020202,%rax > 0x000000000004845b <+267>: xor %rdx,%rcx > 0x000000000004845e <+270>: add %rax,%rcx > 0x0000000000048461 <+273>: movabs $0xf8f8f8f8f8f8f8f8,%rax > 0x000000000004846b <+283>: or %rcx,%rdx > 0x000000000004846e <+286>: test %rax,%rdx > 0x0000000000048471 <+289>: sete %al > 0x0000000000048474 <+292>: retq Yeah, the three constants are expensive. Too bad the really fancy version sums twos and xors fours; if it were the opposite, it could have used lea and then I would have chosen that one just for the coolness factor. (Quoting Avi, "mmu.c is designed around the fact that x86 has an instruction to do "x = 12 + 9*y"). Paolo