Thanks. Ira.
That gives me a hint as to what's going on.
(I could swear that the failing example vectorized
in an earlier life, even though it was reading
the iteration count from a file. Unfortunately, I have
no proof of this!)
I'll look into the code you cited and see what
guidance I can find there.
There is, of course, a substantial difference in performance
between vectorized and non-vectorized codes on large
arrays, so I am keen to see what we can do here, in
the cases where we do not know iteration counts statically.
I am guessing ( I just received your message, so have not
read the code you recommend...) that the requirement for
static knowledge of iteration count is based on a need to
peel the first/last iterations from the loop, so that
the remaining iterations fill the vector registers
exactly.
The problem I have on code generation is very similar to yours:
I have array expressions described as loops, yet some of those
expressions are over array shape vectors, so may only be
a few (say 1-4) elements, and vectorization is a net loss.
Perhaps #pragma directives to the compiler could help here,
at least in some cases.
At any rate, you have pointed me the direction where I should
be able to find an answer, or at least ask a more precise
question or two.
Thanks again,
Robert
Ira Rosen wrote:
gcc-help-owner@xxxxxxxxxxx wrote on 03/06/2010 09:37:01 PM:
Hi. I'm having a problem with GCC vectorization on an Opteron 165.
I have two codes, which are, unfortunately, machine-generated
and large, which differ, as far as I tell, only in the source
of the loop size, N, for a loop roughly of this form:
for( i=0; i<N; i++) {
vec[i] = i;
}
In both cases, N comes from another function and is theoretically
not inlined. In the first case, N is generated by an identity
function that hides its value; this case vectorizes nicely,
if the presence of punpckldq instructions is suitable evidence.
(papiex confirms vectorization with high PAPI_VEC_INS counts.)
In the other case, N comes from a sscanf, and is very well hidden,
since it comes from the command line, ultimately. This case
does not vectorize, at present. It did vectorize some months ago...
This is on: gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu12)
Neither the compiler nor the OS have changed in that time; the
code going into gcc has, of course, changed as the sac2c compiler
has evolved.
So, are there some subtle (or not subtle...) criteria that gcc has
for deciding when to emit vector ops, based on array size, perhaps?
Alternately, if someone can point me at the relevant gcc source code,
maybe I can get an idea as to what's going on. Or, if there is
a bugzilla site for it, I'll take a look there.
Auto-vectorization can fail if number of iterations can't be computed. The
vectorizer calls number_of_exit_cond_executions() in tree-
scalar-evolution.c to determine loop bound.
HTH,
Ira
Thanks,
Robert