Re: [PATCH] x86, msr: Save an instruction in amd64 rdtsc, rdmsr, and

"H. Peter Anvin" <hpa@xxxxxxxxx> · Thu, 18 Jun 2015 12:44:49 -0700

The ABI does not specify that the high bits are defined.  They will usually be zero, but aren't guaranteed to be.

On June 18, 2015 12:24:29 PM PDT, Andy Lutomirski <luto@xxxxxxxxxxxxxx> wrote:
>On Jun 18, 2015 3:41 AM, "Borislav Petkov" <bp@xxxxxxx> wrote:
>>
>> On Thu, Jun 18, 2015 at 03:59:06AM -0400, George Spelvin wrote:
>> > Before, the code to do rdtsc looked like:
>> >       rdtsc
>> >       shl     $0x20,%rdx
>> >       mov     %eax,%eax
>> >       or      %rdx,%rax
>> >
>> > The "mov %eax,%eax" is required to clear the high 32 bits of %rax.
>> >
>> > By declaring low and high as 64-bit variables, the code is
>simplified to
>> >       rdtsc
>> >       shl     $0x20,%rdx
>> >       or      %rdx,%rax
>> >
>> > Yes, it's a 2-byte instruction that's not on a critical path, but
>> > there are principles to be upheld.
>> >
>> > Every user of EAX_EDX_RET has been checked.  I tried to check users
>of
>> > EAX_EDX_ARGS, but there weren't any, so I deleted it to be safe.
>> >
>> > (There's no benefit to making "high" 64 bits, but it was the
>simplest
>> > way to proceed.)
>> >
>> > Signed-off-by: George Spelvin <linux@xxxxxxxxxxx>
>> > ---
>> > I came across this while in the middle of hacking on some other
>code that
>> > isn't booting yet, so I haven't exactly tested it thoroughly.  I
>*did*
>> > check both the Intel and AMD specifications to ensure that this was
>a safe
>> > optimization, and did user-space tests on both Intel and AMD
>processors.
>> >
>> > But it's simple enough I'm willing to risk public embarrassment.
>> >
>> >  arch/x86/include/asm/msr.h | 7 +++----
>> >  1 file changed, 3 insertions(+), 4 deletions(-)
>> >
>> > diff --git a/arch/x86/include/asm/msr.h
>b/arch/x86/include/asm/msr.h
>> > index de36f22e..d6b8a80d 100644
>> > --- a/arch/x86/include/asm/msr.h
>> > +++ b/arch/x86/include/asm/msr.h
>> > @@ -46,14 +46,13 @@ static inline unsigned long long
>native_read_tscp(unsigned int *aux)
>> >   * it means rax *or* rdx.
>> >   */
>> >  #ifdef CONFIG_X86_64
>> > -#define DECLARE_ARGS(val, low, high) unsigned low, high
>> > -#define EAX_EDX_VAL(val, low, high)  ((low) | ((u64)(high) << 32))
>> > -#define EAX_EDX_ARGS(val, low, high) "a" (low), "d" (high)
>> > +/* Using 64-bit values saves one instruction clearing the high
>half of low */
>> > +#define DECLARE_ARGS(val, low, high) unsigned long low, high
>> > +#define EAX_EDX_VAL(val, low, high)  ((low) | (high) << 32)
>> >  #define EAX_EDX_RET(val, low, high)  "=a" (low), "=d" (high)
>> >  #else
>> >  #define DECLARE_ARGS(val, low, high) unsigned long long val
>> >  #define EAX_EDX_VAL(val, low, high)  (val)
>> > -#define EAX_EDX_ARGS(val, low, high) "A" (val)
>> >  #define EAX_EDX_RET(val, low, high)  "=A" (val)
>> >  #endif
>>
>> Looks ok to me, I'll pick it up along with Andy's TSC cleanup. CCed.
>>
>
>Weird.  Why does gcc think that an "=a" constraint on a 32-bit
>variable results in undefined high bits?
>
>I just compiled this:
>
>unsigned long long merge(unsigned int low, unsigned int high)
>{
>    return (low) | ((unsigned long long)(high) << 32);
>}
>
>and I get this:
>
>merge:
>        movq    %rsi, %rax
>        movl    %edi, %edi
>        salq    $32, %rax
>        orq     %rdi, %rax
>        ret
>
>Huh?  Clang does the same thing.
>
>In fact, if I compile:
>
>extern unsigned int func(void);
>
>unsigned long long promote(void)
>{
>    return func();
>}
>
>I get:
>
>promote:
>        subq    $8, %rsp
>        call    func
>        addq    $8, %rsp
>        movl    %eax, %eax
>        ret
>
>Now I'm extra confused.  Do I just misunderstand the ABI, or are both
>clang and gcc missing an obvious optimization?
>
>--Andy

-- 
Sent from my mobile phone.  Please pardon brevity and lack of formatting.
--
To unsubscribe from this list: send the line "unsubscribe linux-x86_64" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html