On Jun 18, 2015 3:41 AM, "Borislav Petkov" <bp@xxxxxxx> wrote: > > On Thu, Jun 18, 2015 at 03:59:06AM -0400, George Spelvin wrote: > > Before, the code to do rdtsc looked like: > > rdtsc > > shl $0x20,%rdx > > mov %eax,%eax > > or %rdx,%rax > > > > The "mov %eax,%eax" is required to clear the high 32 bits of %rax. > > > > By declaring low and high as 64-bit variables, the code is simplified to > > rdtsc > > shl $0x20,%rdx > > or %rdx,%rax > > > > Yes, it's a 2-byte instruction that's not on a critical path, but > > there are principles to be upheld. > > > > Every user of EAX_EDX_RET has been checked. I tried to check users of > > EAX_EDX_ARGS, but there weren't any, so I deleted it to be safe. > > > > (There's no benefit to making "high" 64 bits, but it was the simplest > > way to proceed.) > > > > Signed-off-by: George Spelvin <linux@xxxxxxxxxxx> > > --- > > I came across this while in the middle of hacking on some other code that > > isn't booting yet, so I haven't exactly tested it thoroughly. I *did* > > check both the Intel and AMD specifications to ensure that this was a safe > > optimization, and did user-space tests on both Intel and AMD processors. > > > > But it's simple enough I'm willing to risk public embarrassment. > > > > arch/x86/include/asm/msr.h | 7 +++---- > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h > > index de36f22e..d6b8a80d 100644 > > --- a/arch/x86/include/asm/msr.h > > +++ b/arch/x86/include/asm/msr.h > > @@ -46,14 +46,13 @@ static inline unsigned long long native_read_tscp(unsigned int *aux) > > * it means rax *or* rdx. > > */ > > #ifdef CONFIG_X86_64 > > -#define DECLARE_ARGS(val, low, high) unsigned low, high > > -#define EAX_EDX_VAL(val, low, high) ((low) | ((u64)(high) << 32)) > > -#define EAX_EDX_ARGS(val, low, high) "a" (low), "d" (high) > > +/* Using 64-bit values saves one instruction clearing the high half of low */ > > +#define DECLARE_ARGS(val, low, high) unsigned long low, high > > +#define EAX_EDX_VAL(val, low, high) ((low) | (high) << 32) > > #define EAX_EDX_RET(val, low, high) "=a" (low), "=d" (high) > > #else > > #define DECLARE_ARGS(val, low, high) unsigned long long val > > #define EAX_EDX_VAL(val, low, high) (val) > > -#define EAX_EDX_ARGS(val, low, high) "A" (val) > > #define EAX_EDX_RET(val, low, high) "=A" (val) > > #endif > > Looks ok to me, I'll pick it up along with Andy's TSC cleanup. CCed. > Weird. Why does gcc think that an "=a" constraint on a 32-bit variable results in undefined high bits? I just compiled this: unsigned long long merge(unsigned int low, unsigned int high) { return (low) | ((unsigned long long)(high) << 32); } and I get this: merge: movq %rsi, %rax movl %edi, %edi salq $32, %rax orq %rdi, %rax ret Huh? Clang does the same thing. In fact, if I compile: extern unsigned int func(void); unsigned long long promote(void) { return func(); } I get: promote: subq $8, %rsp call func addq $8, %rsp movl %eax, %eax ret Now I'm extra confused. Do I just misunderstand the ABI, or are both clang and gcc missing an obvious optimization? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-x86_64" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html