The ABI does not specify that the high bits are defined. They will usually be zero, but aren't guaranteed to be. On June 18, 2015 12:24:29 PM PDT, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote: >On Jun 18, 2015 3:41 AM, "Borislav Petkov" <bp@xxxxxxx> wrote: >> >> On Thu, Jun 18, 2015 at 03:59:06AM -0400, George Spelvin wrote: >> > Before, the code to do rdtsc looked like: >> > rdtsc >> > shl $0x20,%rdx >> > mov %eax,%eax >> > or %rdx,%rax >> > >> > The "mov %eax,%eax" is required to clear the high 32 bits of %rax. >> > >> > By declaring low and high as 64-bit variables, the code is >simplified to >> > rdtsc >> > shl $0x20,%rdx >> > or %rdx,%rax >> > >> > Yes, it's a 2-byte instruction that's not on a critical path, but >> > there are principles to be upheld. >> > >> > Every user of EAX_EDX_RET has been checked. I tried to check users >of >> > EAX_EDX_ARGS, but there weren't any, so I deleted it to be safe. >> > >> > (There's no benefit to making "high" 64 bits, but it was the >simplest >> > way to proceed.) >> > >> > Signed-off-by: George Spelvin <linux@xxxxxxxxxxx> >> > --- >> > I came across this while in the middle of hacking on some other >code that >> > isn't booting yet, so I haven't exactly tested it thoroughly. I >*did* >> > check both the Intel and AMD specifications to ensure that this was >a safe >> > optimization, and did user-space tests on both Intel and AMD >processors. >> > >> > But it's simple enough I'm willing to risk public embarrassment. >> > >> > arch/x86/include/asm/msr.h | 7 +++---- >> > 1 file changed, 3 insertions(+), 4 deletions(-) >> > >> > diff --git a/arch/x86/include/asm/msr.h >b/arch/x86/include/asm/msr.h >> > index de36f22e..d6b8a80d 100644 >> > --- a/arch/x86/include/asm/msr.h >> > +++ b/arch/x86/include/asm/msr.h >> > @@ -46,14 +46,13 @@ static inline unsigned long long >native_read_tscp(unsigned int *aux) >> > * it means rax *or* rdx. >> > */ >> > #ifdef CONFIG_X86_64 >> > -#define DECLARE_ARGS(val, low, high) unsigned low, high >> > -#define EAX_EDX_VAL(val, low, high) ((low) | ((u64)(high) << 32)) >> > -#define EAX_EDX_ARGS(val, low, high) "a" (low), "d" (high) >> > +/* Using 64-bit values saves one instruction clearing the high >half of low */ >> > +#define DECLARE_ARGS(val, low, high) unsigned long low, high >> > +#define EAX_EDX_VAL(val, low, high) ((low) | (high) << 32) >> > #define EAX_EDX_RET(val, low, high) "=a" (low), "=d" (high) >> > #else >> > #define DECLARE_ARGS(val, low, high) unsigned long long val >> > #define EAX_EDX_VAL(val, low, high) (val) >> > -#define EAX_EDX_ARGS(val, low, high) "A" (val) >> > #define EAX_EDX_RET(val, low, high) "=A" (val) >> > #endif >> >> Looks ok to me, I'll pick it up along with Andy's TSC cleanup. CCed. >> > >Weird. Why does gcc think that an "=a" constraint on a 32-bit >variable results in undefined high bits? > >I just compiled this: > >unsigned long long merge(unsigned int low, unsigned int high) >{ > return (low) | ((unsigned long long)(high) << 32); >} > >and I get this: > >merge: > movq %rsi, %rax > movl %edi, %edi > salq $32, %rax > orq %rdi, %rax > ret > >Huh? Clang does the same thing. > >In fact, if I compile: > >extern unsigned int func(void); > >unsigned long long promote(void) >{ > return func(); >} > >I get: > >promote: > subq $8, %rsp > call func > addq $8, %rsp > movl %eax, %eax > ret > >Now I'm extra confused. Do I just misunderstand the ABI, or are both >clang and gcc missing an obvious optimization? > >--Andy -- Sent from my mobile phone. Please pardon brevity and lack of formatting. -- To unsubscribe from this list: send the line "unsubscribe linux-x86_64" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html