Re: Accessing the vector elements

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Oct 23, 2006, at 11:42 PM, Ian Lance Taylor wrote:

Chris Lattner <clattner@xxxxxxxxx> writes:

But at my previous
company, we got significantly better generated code in some cases by
breaking that aliasing. In particular this aliasing forces a pointer
to a vector of char to alias char* and therefore alias everything.
But if the compiler breaks that aliasing, then your suggestion of
using a cast will fail in some cases.

The compiler shouldn't break that aliasing, and using a union will
give you poor code anyway.  LLVM will optimize either idiom into
element extract/insert operations, but I don't think GCC will in an
any case.  If you care about performance with GCC, you have to use
ISA-specific builtins.

gcc generates efficient insert/extract with the union if the backend
is written correctly.  See the uses of vec_extract and vec_get in
extract_bit_field and store_bit_field in expmed.c.

Very interesting! Does this occur for Altivec/SSE? I've always seen GCC go to memory, it doesn't seem to promote the union to live in a register. I tried this:

#include <emmintrin.h>

void test1(__m128 V, float *P) {
   float Tmp;
   union {
     __m128 V;
     float A[4];
   } u;
   u.V = V;
   Tmp = u.A[1];
   *P = Tmp+Tmp;
}

void test2(__m128 V, float *P) {
   float Tmp;
   Tmp = ((float*)&V)[1];
   *P = Tmp+Tmp;
}

but GCC produces:

$ gcc foo.c -S -o - -O3 -msse3 -fomit-frame-pointer
        .text
        .align 4,0x90
.globl _test1
_test1:
        subl    $28, %esp
        movaps  %xmm0, (%esp)
        movl    32(%esp), %eax
        movss   4(%esp), %xmm0
        addss   %xmm0, %xmm0
        movss   %xmm0, (%eax)
        addl    $28, %esp
        ret
        .align 4,0x90
.globl _test2
_test2:
        subl    $28, %esp
        movaps  %xmm0, (%esp)
        movl    32(%esp), %eax
        movss   4(%esp), %xmm0
        addss   %xmm0, %xmm0
        movss   %xmm0, (%eax)
        addl    $28, %esp
        ret

I'd like to get something like this:

_test2:
        shufps $1, %xmm0, %xmm0
        addss %xmm0, %xmm0
        movl 4(%esp), %eax
        movss %xmm0, (%eax)
        ret

In what cases does GCC do this?

Thanks,

-Chris


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux