On Oct 23, 2006, at 11:42 PM, Ian Lance Taylor wrote:
Chris Lattner <clattner@xxxxxxxxx> writes:
But at my previous
company, we got significantly better generated code in some cases by
breaking that aliasing. In particular this aliasing forces a
pointer
to a vector of char to alias char* and therefore alias everything.
But if the compiler breaks that aliasing, then your suggestion of
using a cast will fail in some cases.
The compiler shouldn't break that aliasing, and using a union will
give you poor code anyway. LLVM will optimize either idiom into
element extract/insert operations, but I don't think GCC will in an
any case. If you care about performance with GCC, you have to use
ISA-specific builtins.
gcc generates efficient insert/extract with the union if the backend
is written correctly. See the uses of vec_extract and vec_get in
extract_bit_field and store_bit_field in expmed.c.
Very interesting! Does this occur for Altivec/SSE? I've always seen
GCC go to memory, it doesn't seem to promote the union to live in a
register. I tried this:
#include <emmintrin.h>
void test1(__m128 V, float *P) {
float Tmp;
union {
__m128 V;
float A[4];
} u;
u.V = V;
Tmp = u.A[1];
*P = Tmp+Tmp;
}
void test2(__m128 V, float *P) {
float Tmp;
Tmp = ((float*)&V)[1];
*P = Tmp+Tmp;
}
but GCC produces:
$ gcc foo.c -S -o - -O3 -msse3 -fomit-frame-pointer
.text
.align 4,0x90
.globl _test1
_test1:
subl $28, %esp
movaps %xmm0, (%esp)
movl 32(%esp), %eax
movss 4(%esp), %xmm0
addss %xmm0, %xmm0
movss %xmm0, (%eax)
addl $28, %esp
ret
.align 4,0x90
.globl _test2
_test2:
subl $28, %esp
movaps %xmm0, (%esp)
movl 32(%esp), %eax
movss 4(%esp), %xmm0
addss %xmm0, %xmm0
movss %xmm0, (%eax)
addl $28, %esp
ret
I'd like to get something like this:
_test2:
shufps $1, %xmm0, %xmm0
addss %xmm0, %xmm0
movl 4(%esp), %eax
movss %xmm0, (%eax)
ret
In what cases does GCC do this?
Thanks,
-Chris