On 8/30/2013 2:00 PM, Nagaraju Mekala wrote:
I was working with X86 gcc-4.7.2 version. I found that gcc was not
able to vectorize small simple loops with negative offsets...
assume ntimes & LEN to be some constant. In my case they are 200000
int abc()
{
for (int nl = 0; nl < ntimes*3; nl++) {
for (int i = LEN - 1; i >= 0; i--) {
a[i] = b[i] + (float) 1.;
}
}}
if we modify the above code as below Gcc has vectorized them.
int abc()
{
for (int nl = 0; nl < ntimes*3; nl++) {
for (int i = 0; i <= LEN - 1; i++) {
a[i] = b[i] + (float) 1.;
}
}}
Can anyone explain why GCC is not able to vectorize is..
Thanks,
Nagaraju
I would call this negative stride. I don't see any offsets in your
example. If this is 32-bit mode, you must be specifying a -march or you
would not have vectorization even with the loop reversed (and you must
be specifying -std=c99 or a C++ mode).
This is well known. There are compilers e.g. Sun Studio which will
vectorize negative 1 stride, but without the peeling for alignment which
is desired for several common platforms (including the defaults for
x86_64 and older vector ones for 387).
I don't know how fundamental this is to gcc. For targets with gather
instructions (not supported by gcc-4.7) those can be used to vectorize
without reversing the loop (not necessarily with full efficiency).
Your example also brings up the question of whether you expect the
compiler to eliminate duplicated code automatically (the compiler may
see that your outer loop can be collapsed to a single iteration).
--
Tim Prince