Re: GCC vectorization problem on X86

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




gcc-help-owner@xxxxxxxxxxx wrote on 06/06/2010 08:40:56 PM:

> Hi, Ira.
>
> I wrote this code to exercise vectorization, or lack thereof;
> it operates precisely as you described:
>
> #include <stdlib.h>
> #include <stdio.h>
>
> int main( int argc, char *argv[])
> {
>    int N;
>    int *vec;
>    int i;
>
> #ifdef VECTORIZE
>     N = 103;
> #else // VECTORIZE
>     sscanf("103", "%d", &N);
> #endif // VECTORIZE
>     printf( "N is: %d\n", N);
>     vec = (int*) malloc( sizeof(int) * N);
>
>     for( i=0; i<N; i++) {
>       vec[i] = i;
>     }
>     for( i=0; i<10; i++) {
>       printf( "%d,", vec[i]);
>     }
>     free( vec);
>
>     exit(0);
> }
>
> If I compile with "gcc -O3 vectorize.c -DVECTORIZE",
> it vectorizes nicely, as long as N>7 (on my Opteron/Ubuntu system).
> If I compile with "gcc -O3 vectorize.c",
> no vectorization takes place, as you noted.
>
> I see there is a "#pragma novector"; my naive wish here is for a
> "#pragma vector", which would strongly encourage vectorization,
> even in the absence of known iteration count. That would require,
> presumably, a loop-peeling loop, followed by
> a possibly-zero-iteration vector loop.
>
> In my case, the generated C code has already had small arrays
> (when we know array sizes statically) unrolled or eliminated in
> other ways, so most of the remaining FOR-loops would benefit from
> vectorization, even in the absence of iteration count.
> I could manually strip-mine these loops to get a fixed iteration
> count to enable vectorization,
> but gcc should be doing that job for me, IMO.

Maybe GRAPHITE can take care of this.

>
> If someone could suggest a nice way to get such a pragma, or to
> otherwise encourage the compiler to lean more in the vectorization
> direction, I'm all ears. I'd even undertake to write the
> pragma, if it's not a huge effort. (I know close to zilch about
> gcc internals...)

Here is an old discussion of vectorizer pragmas
http://gcc.gnu.org/ml/gcc-patches/2005-02/msg01560.html.

Ira

>
> Rationale: There are many problems where it is simply not
> possible to know array sizes statically. For example, analysis of
> data base queries ("What was the mean number of shares of IBM
> traded, per share on the NYSE today?"). Data mining problems
> also fall into this category.
>
> Regards,
> Robert
>
> Ira Rosen wrote:
> >
> > gcc-help-owner@xxxxxxxxxxx wrote on 03/06/2010 09:37:01 PM:
> >
> >> Hi. I'm having a problem with GCC vectorization on an Opteron 165.
> >>
> >> I have two codes, which are, unfortunately, machine-generated
> >> and large, which differ, as far as I tell, only in the source
> >> of the loop size, N, for a loop roughly of this form:
> >>
> >>   for( i=0; i<N; i++) {
> >>     vec[i] = i;
> >>    }
> >>
> >> In both cases, N comes from another function and is theoretically
> >> not inlined. In the first case, N is generated by an identity
> >> function that hides its value; this case vectorizes nicely,
> >> if the presence of punpckldq instructions is suitable evidence.
> >> (papiex confirms vectorization with high PAPI_VEC_INS counts.)
> >>
> >> In the other case, N comes from a sscanf, and is very well hidden,
> >> since it comes from the command line, ultimately. This case
> >> does not vectorize, at present. It did vectorize some months ago...
> >>
> >> This is on:  gcc version 4.3.2 (Ubuntu 4.3.2-1ubuntu12)
> >> Neither the compiler nor the OS have changed in that time; the
> >> code going into gcc has, of course, changed as the sac2c compiler
> >> has evolved.
> >>
> >> So, are there some subtle (or not subtle...) criteria that gcc has
> >> for deciding when to emit vector ops, based on array size, perhaps?
> >>
> >> Alternately, if someone can point me at the relevant gcc source code,
> >> maybe I can get an idea as to what's going on. Or, if there is
> >> a bugzilla site for it, I'll take a look there.
> >
> > Auto-vectorization can fail if number of iterations can't be computed.
The
> > vectorizer calls number_of_exit_cond_executions() in tree-
> > scalar-evolution.c to determine loop bound.
> >
> > HTH,
> > Ira
> >
> >
> >> Thanks,
> >> Robert
> >>
> >>
> >>
> >>
> >>
> >>
> >>
> >
>



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux