Re: generating unaligned vector load instructions?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/18/2013 7:01 PM, Norbert Lange wrote:
Hello Tim,

can you specify which versions, maybe post the commandline, or trying to compile for 32bit (-m32 switch)? Also I dont understand the comment about splitting - to avoid misunderstanding - the generated code segfaults on my AthlonX2 so its not a question about optimal code, but actually working one

Im unable to generate the right instruction, and I dont exactly know why it should differ between versions (... except bugs of course...). And I just want to know the right way to force unaligned loads, without inline assembly.

Btw: The code doesnt compile on gcc < 4.7 as I just realised - cant multipy vector with scalars on older versions.
I wasn't even certain which of my gcc installations had 32-bit counterparts, but Red Hat 4.4.6 appeared to accept your code for -m64 but reject it for -m32. Intel icc, which shares a lot of stuff with the active gcc, rejected your code. Many people here advocate options such as -pedantic -Wall to increase the number of warnings, so you will get those warnings even where gcc accepts your code. I thought X2 could accept nearly all normal sse2 code (original Turion didn't) but I guess you are wanting to test its limits. Now that you've revealed your actual target, someone might suggest a more appropriate arch option. Did you read about the errata for this instruction on your chip? http://support.amd.com/us/Processor_TechDocs/25759.pdf Splitting unaligned 128-bit moves into separate 64-bit moves was a common tactic likely to improve performance on CPUs prior to AMD Barcelona and Intel Nehalem (not to mention avoid bugs in hardware implementation). It probably didn't hurt to split the instruction explicitly on a CPU where the hardware would split it anyway (I thought this might be true of X2). Even with Intel Westmere there were situations where splitting might improve performance. So gcc can't be faulted if it makes that translation, when you didn't tell it to compile for a more recent CPU, or you specify a target which is known to have problems with certain instructions.

--
Tim Prince





[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux