On Tue, 14 Aug 2007, Dorit Nuzman wrote:
In this case, 4.3 will vectorize the loop on 15.
The others are just too complex of reduction patterns right now, it
looks like.
Feel free to file a missed optimization bug on it :)
Actually there's already a PR for it - PR32824. I'm getting more and more
testcases where this pattern occurs... I hope the generic reduction
detection will be ready in the near future...
Another problem seems to be that there seems to be little effort on part
of GCC to automatically align arrays to a 16-byte boundary - and there's
no excuse for that in case of static ones. (Unless, of course, I'm
misinterpreting the vectorizer report "vectorizing unaligned access".)
Vectorizing operations on unaligned arrays is cosiderably less efficient.
On a separate note, why is (float * float) getting transformed to a
powf() call (according to the vectorizer report, again), when multiplying
seems to be faster for low powers?
And is there an eta on a vectorizable sinf()?
Gordan