Hi, Ira.
I wrote this code to exercise vectorization, or lack thereof;
it operates precisely as you described:
#include <stdlib.h>
#include <stdio.h>
int main( int argc, char *argv[])
{
int N;
int *vec;
int i;
#ifdef VECTORIZE
N = 103;
#else // VECTORIZE
sscanf("103", "%d", &N);
#endif // VECTORIZE
printf( "N is: %d\n", N);
vec = (int*) malloc( sizeof(int) * N);
for( i=0; i<N; i++) {
vec[i] = i;
}
for( i=0; i<10; i++) {
printf( "%d,", vec[i]);
}
free( vec);
exit(0);
}
If I compile with "gcc -O3 vectorize.c -DVECTORIZE",
it vectorizes nicely, as long as N>7 (on my Opteron/Ubuntu system).
If I compile with "gcc -O3 vectorize.c",
no vectorization takes place, as you noted.
I see there is a "#pragma novector"; my naive wish here is for a
"#pragma vector", which would strongly encourage vectorization,
even in the absence of known iteration count. That would require,
presumably, a loop-peeling loop, followed by
a possibly-zero-iteration vector loop.
In my case, the generated C code has already had small arrays
(when we know array sizes statically) unrolled or eliminated in
other ways, so most of the remaining FOR-loops would benefit from
vectorization, even in the absence of iteration count.
I could manually strip-mine these loops to get a fixed iteration
count to enable vectorization,
but gcc should be doing that job for me, IMO.
If someone could suggest a nice way to get such a pragma, or to
otherwise encourage the compiler to lean more in the vectorization
direction, I'm all ears. I'd even undertake to write the
pragma, if it's not a huge effort. (I know close to zilch about
gcc internals...)
Rationale: There are many problems where it is simply not
possible to know array sizes statically. For example, analysis of
data base queries ("What was the mean number of shares of IBM
traded, per share on the NYSE today?"). Data mining problems
also fall into this category.
Regards,
Robert
Ira Rosen wrote:
gcc-help-owner@xxxxxxxxxxx wrote on 03/06/2010 09:37:01 PM:
Hi. I'm having a problem with GCC vectorization on an Opteron 165.
I have two codes, which are, unfortunately, machine-generated
and large, which differ, as far as I tell, only in the source
of the loop size, N, for a loop roughly of this form:
for( i=0; i<N; i++) {
vec[i] = i;
}
In both cases, N comes from another function and is theoretically
not inlined. In the first case, N is generated by an identity
function that hides its value; this case vectorizes nicely,
if the presence of punpckldq instructions is suitable evidence.
(papiex confirms vectorization with high PAPI_VEC_INS counts.)
In the other case, N comes from a sscanf, and is very well hidden,
since it comes from the command line, ultimately. This case
does not vectorize, at present. It did vectorize some months ago...
This is on: gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu12)
Neither the compiler nor the OS have changed in that time; the
code going into gcc has, of course, changed as the sac2c compiler
has evolved.
So, are there some subtle (or not subtle...) criteria that gcc has
for deciding when to emit vector ops, based on array size, perhaps?
Alternately, if someone can point me at the relevant gcc source code,
maybe I can get an idea as to what's going on. Or, if there is
a bugzilla site for it, I'll take a look there.
Auto-vectorization can fail if number of iterations can't be computed. The
vectorizer calls number_of_exit_cond_executions() in tree-
scalar-evolution.c to determine loop bound.
HTH,
Ira
Thanks,
Robert