On 9/18/2013 7:01 PM, Norbert Lange wrote:
Hello Tim,
can you specify which versions, maybe post the commandline, or trying
to compile for 32bit (-m32 switch)?
Also I dont understand the comment about splitting - to avoid
misunderstanding - the generated code segfaults on my AthlonX2 so its
not a question about optimal code, but actually working one
Im unable to generate the right instruction, and I dont exactly know
why it should differ between versions (... except bugs of course...).
And I just want to know the right way to force unaligned loads,
without inline assembly.
Btw: The code doesnt compile on gcc < 4.7 as I just realised - cant
multipy vector with scalars on older versions.
I wasn't even certain which of my gcc installations had 32-bit
counterparts, but Red Hat 4.4.6 appeared to accept your code for -m64
but reject it for -m32. Intel icc, which shares a lot of stuff with the
active gcc, rejected your code. Many people here advocate options such
as -pedantic -Wall to increase the number of warnings, so you will get
those warnings even where gcc accepts your code.
I thought X2 could accept nearly all normal sse2 code (original Turion
didn't) but I guess you are wanting to test its limits. Now that you've
revealed your actual target, someone might suggest a more appropriate
arch option. Did you read about the errata for this instruction on your
chip? http://support.amd.com/us/Processor_TechDocs/25759.pdf
Splitting unaligned 128-bit moves into separate 64-bit moves was a
common tactic likely to improve performance on CPUs prior to AMD
Barcelona and Intel Nehalem (not to mention avoid bugs in hardware
implementation). It probably didn't hurt to split the instruction
explicitly on a CPU where the hardware would split it anyway (I thought
this might be true of X2). Even with Intel Westmere there were
situations where splitting might improve performance. So gcc can't be
faulted if it makes that translation, when you didn't tell it to compile
for a more recent CPU, or you specify a target which is known to have
problems with certain instructions.
--
Tim Prince