Re: Packed-simd SSE for only vectorized loops

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Fahimeh Yazdanpanah wrote:

I am working on autovectorization. Using gcc-4.3.3, 64-bit Ubuntu on an intel core2, and vectorization flags, I found that gcc produces packed-simd SSE opcodes for all instructions in vectorized loops and for some instructions in non-vectorized loops. Would you please let me know if there a flag or switch to disable gcc producing packed-simd instructions for non-vectorized loops? Or is there any way to distinguish between the packed-simd SSE instructions within vectorized loop and within non-vectorized loops?

There are many situations where it is correct for the compiler to use parallel instructions, even though only the scalar operand is used. The parallel move instructions for register to register move have been preferred since first documented for Athlon-32, as they permit hardware register renaming by dropping the requirement to preserve contents of the unused slots in a scalar move. gcc also observes a similar work-around for performance stalls in certain conversions between float and double. Back in the Athlon-32 days, these optimizations were performed only when -march=athlonxxx switches were set. If it's worth it to you, you might try to find out what was changed to fix the performance problem for the Intel targets. If you have an interesting story about what you intend to accomplish by spending the time to preserve the extra register slots during scalar moves, let's hear it. However, you can't expect gcc to support some mode where you are caching data in those extra slots by asm while allowing normal compilation and optimization of source code.



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux