Inline assembly - how to get gcc to clear the full rcx register in x86-64 mode

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I have ported an SSE4 strcmp function from http://www.strchr.com/strcmp_and_strlen_using_sse_4.2
to GCC inline assembly:

long __res;
__asm__ __volatile__(
        "sub        $16, %4                    \n"
        "1:\n"
        "add        $16, %4                     \n"
"movdqu (%4), %%xmm0 \n" // Could use any XMM, using register constraint "x" // ".byte 0x48 \n" // REX prefix with REX.w=1, to get result in RCX "pcmpistri $0x18, (%4,%0), %%xmm0 \n" // EQUAL_EACH(0x08) + NEGATIVE_POLARITY(0x10)
        "ja 1b                                \n"
        "jc 2f                                 \n"
        "xor %0, %0                     \n"
"jmp 3f \n" // XXX Extra jump could be avoided in pure asm
        "2:\n"
        "add %4, %0                     \n"
        "movzxb (%0,%1), %0      \n"
        "movzxb (%4,%1), %4      \n"
        "sub %4, %0                     \n"
        "3:\n"
    : "=a"(__res), "=c"(cs) : "0"(cs-ct), "1"(0L), "r"(ct) : "xmm0" );

    return (int) __res;

The problem with this code is that "pcmpistri" returns its result in ECX (i.e. the lower 32 bits of RCX), while the "movzxb" instructions use the full RCX register. One solution is to insert a REX prefix with REX.w bit set ( any gas directive for this? )

However, I'd prefer to have gcc clear RCX at the beginning of the function. The above code loads the "c" register with 0, but the resulting asm code is
"xorl    ecx, ecx"

Is this a bug in GCC? Or how do I get it to clear the full RCX, without doing it 'manually' in the asm block?

Thanks,
Jeroen


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux