Re: [PATCH] x86, msr: Save an instruction in amd64 rdtsc, rdmsr, and

Andy Lutomirski <luto@xxxxxxxxxxxxxx> · Thu, 18 Jun 2015 12:24:29 -0700

On Jun 18, 2015 3:41 AM, "Borislav Petkov" <bp@xxxxxxx> wrote:
>
> On Thu, Jun 18, 2015 at 03:59:06AM -0400, George Spelvin wrote:
> > Before, the code to do rdtsc looked like:
> >       rdtsc
> >       shl     $0x20,%rdx
> >       mov     %eax,%eax
> >       or      %rdx,%rax
> >
> > The "mov %eax,%eax" is required to clear the high 32 bits of %rax.
> >
> > By declaring low and high as 64-bit variables, the code is simplified to
> >       rdtsc
> >       shl     $0x20,%rdx
> >       or      %rdx,%rax
> >
> > Yes, it's a 2-byte instruction that's not on a critical path, but
> > there are principles to be upheld.
> >
> > Every user of EAX_EDX_RET has been checked.  I tried to check users of
> > EAX_EDX_ARGS, but there weren't any, so I deleted it to be safe.
> >
> > (There's no benefit to making "high" 64 bits, but it was the simplest
> > way to proceed.)
> >
> > Signed-off-by: George Spelvin <linux@xxxxxxxxxxx>
> > ---
> > I came across this while in the middle of hacking on some other code that
> > isn't booting yet, so I haven't exactly tested it thoroughly.  I *did*
> > check both the Intel and AMD specifications to ensure that this was a safe
> > optimization, and did user-space tests on both Intel and AMD processors.
> >
> > But it's simple enough I'm willing to risk public embarrassment.
> >
> >  arch/x86/include/asm/msr.h | 7 +++----
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/x86/include/asm/msr.h b/arch/x86/include/asm/msr.h
> > index de36f22e..d6b8a80d 100644
> > --- a/arch/x86/include/asm/msr.h
> > +++ b/arch/x86/include/asm/msr.h
> > @@ -46,14 +46,13 @@ static inline unsigned long long native_read_tscp(unsigned int *aux)
> >   * it means rax *or* rdx.
> >   */
> >  #ifdef CONFIG_X86_64
> > -#define DECLARE_ARGS(val, low, high) unsigned low, high
> > -#define EAX_EDX_VAL(val, low, high)  ((low) | ((u64)(high) << 32))
> > -#define EAX_EDX_ARGS(val, low, high) "a" (low), "d" (high)
> > +/* Using 64-bit values saves one instruction clearing the high half of low */
> > +#define DECLARE_ARGS(val, low, high) unsigned long low, high
> > +#define EAX_EDX_VAL(val, low, high)  ((low) | (high) << 32)
> >  #define EAX_EDX_RET(val, low, high)  "=a" (low), "=d" (high)
> >  #else
> >  #define DECLARE_ARGS(val, low, high) unsigned long long val
> >  #define EAX_EDX_VAL(val, low, high)  (val)
> > -#define EAX_EDX_ARGS(val, low, high) "A" (val)
> >  #define EAX_EDX_RET(val, low, high)  "=A" (val)
> >  #endif
>
> Looks ok to me, I'll pick it up along with Andy's TSC cleanup. CCed.
>

Weird.  Why does gcc think that an "=a" constraint on a 32-bit
variable results in undefined high bits?

I just compiled this:

unsigned long long merge(unsigned int low, unsigned int high)
{
    return (low) | ((unsigned long long)(high) << 32);
}

and I get this:

merge:
        movq    %rsi, %rax
        movl    %edi, %edi
        salq    $32, %rax
        orq     %rdi, %rax
        ret

Huh?  Clang does the same thing.

In fact, if I compile:

extern unsigned int func(void);

unsigned long long promote(void)
{
    return func();
}

I get:

promote:
        subq    $8, %rsp
        call    func
        addq    $8, %rsp
        movl    %eax, %eax
        ret

Now I'm extra confused.  Do I just misunderstand the ABI, or are both
clang and gcc missing an obvious optimization?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-x86_64" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html