On 12/02/2009 07:41 AM, Ira Rosen wrote:
Benjamin Redelings I<benjamin_redelings@xxxxxxxx> wrote on 01/12/2009
18:06:49:
Hi Ira,
1. That's actually quite helpful :-) And thank you for all your work
on this! I can't wait to go make sure my actual code is autovectorized.
Anyway, I didn't see this because I didn't use
-ftree-vectorizer-verbose=9. Would it be possible to mention this at
verbosity levels less than 9? Ideally, level 2, which tells me which
loops aren't vectorized without mentioning all the cost model parameters.
The problem with this is that the vectorizer goes through all the
statements in the loop and checks if they are vectorizable in different
ways. (One of the possibilities, for example, is reduction). If one of the
possibilities fails, it doesn't mean that the statement/loop is not
vectorizable. Only if all of them fail, the vectorization fails. Therefore,
printing an error message for every vectorization possibility is not such a
good idea. But I'll try to see how we can improve the level 2 messages.
Great! Improved level 2 messages will be extremely useful.
( BTW the gcc man pages indicate that 7 is the highest value for
tree-vectorizer-verbose, although it seems that now 9 is the highest
value.)
Thanks, I'll fix this.
Great!
...
3. Finally, the following loop does not even receive a mention as being
not vectorized (that I could find!)
for(i=0;i<argc;i++)
sum += d1[i]*d2[i]*d3[i]*d4[i];
Here d1, d2, d3, and d4 are double*. However, the loop is recognized if
they are float * OR of they are double* but there are only three of
them. I presume this is intended... can you explain why?
No :). I tried to compile it and it looks normal. Could you please attach
the whole file and specify the exact command line? (The only reason I can
think of is that, somehow, the loop gets optimized out).
Ira
Umm.... you are right, it was optimized out :-P Now that I correctly
added a use on the sum, the loop is reported as vectorized. I guess
this raises another question about how to interpret the output of
tree-vectorizer-verbose. Is there a good place on the wiki to add the
following information, to help newcomers get up to speed?
* -ffast-math is required to vectorize dot products and other sums.
* if the loop is optimized out, then there will be no report of its
vectorizability.
* (?) this is the ONLY reason that there would be no such report.
* if a function is inlined into another function, then its
vectorizability will be reported again, in that function.
* if all uses of a function are inlined, then there will be no
"vectorized n loops in function" message.
* functions are (of course) not analyzed by the vectorizer in the order
in which they are declared.
I guess this doesn't matter now, but here is the corrected file, for
reference.
Thanks a lot!
-BenRI
#define ALIGNED __attribute__((aligned(16)))
float* ALIGNED s1(int);
double* ALIGNED s3(int);
int s2(float);
int s4(double);
#define RESTRICT __restrict
// #define RESTRICT
int main(int argc, char* argv[])
{
float* RESTRICT f1 ALIGNED = s1(0);
float* RESTRICT f2 ALIGNED = s1(1);
float* RESTRICT f3 ALIGNED = s1(2);
float* RESTRICT f4 ALIGNED = s1(3);
double* RESTRICT d1 ALIGNED = s3(0);
double* RESTRICT d2 ALIGNED = s3(1);
double* RESTRICT d3 ALIGNED = s3(2);
double* RESTRICT d4 ALIGNED = s3(3);
float sum=0;
int i;
for(i=0;i<argc;i++)
sum += f1[i]*f2[i];
s2(sum);
sum = 0;
for(i=0;i<argc;i++)
f4[i] += f1[i]*f2[i]*f3[i];
s2(sum);
sum = 0;
for(i=0;i<argc;i++)
sum += f1[i]*f2[i]*f3[i]*f4[i];
s2(sum);
double sum2=0;
for(i=0;i<argc;i++)
sum2 += d1[i]*d2[i];
s4(sum2);
sum2 = 0;
for(i=0;i<argc;i++)
d4[i] += d1[i]*d2[i]*d3[i];
s4(sum2);
sum2 = 0;
for(i=0;i<argc;i++)
sum2 += d1[i]*d2[i]*d3[i]*d4[i];
s4(sum2);
return 0;
}