Re: Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Ian,

I was confused by the Intel documentation, which states that pcmpistri provides its result in ECX. I stepped through a debugger, and verified that the upper 32 bits in RCX indeed get set to 0.

I also found that while I cannot distinguish between 32-bit and 64-bit forms of registers in constraints, GCC will choose the optimal flavor for me.

Thanks,
Jeroen

On 06/13/12 18:37, Ian Lance Taylor wrote:
Jeroen van Bemmel<jbemmel@xxxxxxxxx>  writes:

I have ported an SSE4 strcmp function from
http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
to GCC inline assembly:

long __res;
__asm__ __volatile__(
         "sub        $16, %4                    \n"
         "1:\n"
         "add        $16, %4                     \n"
         "movdqu        (%4), %%xmm0 \n"    // Could use any XMM, using
register constraint "x"
         // ".byte 0x48                           \n"    // REX prefix
with REX.w=1, to get result in RCX
         "pcmpistri    $0x18, (%4,%0), %%xmm0  \n"    //
EQUAL_EACH(0x08) + NEGATIVE_POLARITY(0x10)
         "ja 1b                                \n"
         "jc 2f                                 \n"
         "xor %0, %0                     \n"
         "jmp 3f                              \n"    // XXX Extra jump
could be avoided in pure asm
         "2:\n"
         "add %4, %0                     \n"
         "movzxb (%0,%1), %0      \n"
         "movzxb (%4,%1), %4      \n"
         "sub %4, %0                     \n"
         "3:\n"
     : "=a"(__res), "=c"(cs) : "0"(cs-ct), "1"(0L), "r"(ct) : "xmm0" );

     return (int) __res;

The problem with this code is that "pcmpistri" returns its result in
ECX (i.e. the lower 32 bits of RCX), while the "movzxb" instructions
use the full RCX register.
One solution is to insert a REX prefix with REX.w bit set ( any gas
directive for this? )
Normally setting the low 32 bits of an x86 register will zero out the
upper 32 bits.  Is that not true for pcmpistri?

Otherwise, it sounds like you want the addressing mode (%rax,%ecx).
Does x86 really have that addressing mode?

Why not just zero extend %ecx to %rcx?

However, I'd prefer to have gcc clear RCX at the beginning of the
function. The above code loads the "c" register with 0, but the
resulting asm code is
"xorl    ecx, ecx"
That instruction will indeed set %ecx to zero.  Think about it.

Ian




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux