On 9/18/2013 12:14 PM, Norbert Lange wrote:
Hi,
I wonder how one could get the compiler to generate the "movdqu"
instruction, since the vector extensions always seem to assume that
everything will be aligned to 16 byte.
I tried using a packed struct and this dint help much. Of course one
can always resort to inline assembly but this should not be necessary
Compile with:
gcc -O2 -S -msse2 testvecs.c
--------------------------
I do see a movdqu, over a range of gcc (64-bit) versions from 4.4.6 to
4.9. Some of the compilers are complaining about mixed data type
arithmetic on lines 29 and 42.
I don't know whether it applies here, but splitting an unaligned memory
move is likely to be the right thing on platforms up through Intel
Westmere, so you would want to specify -march=native to optimize for
newer ones.
--
Tim Prince