Although I haven't tried this kind of thing with the new SSE4+ instructions, with older instruction sets, in general, using SSE ps instructions in these cases will actually reduce performance. Even if you had a float4 type instead of a float3, it's unlikely that you'd get a speed improvement using structs like this. SSE, and most other SIMD methodologies work best with a struct-of-arrays type of format. The overhead for SSE will simply be too high to be worth the benefits derived from SSE for a case like the one presented. You might have to think at a higher algorithmic level to make good use of SSE. Brian On Thu, Apr 29, 2010 at 10:17 AM, Axel Freyn <axel-freyn@xxxxxx> wrote: > Hi Qianqian, >> > First: I don't know anything about the vectorizer, so be very careful > with my answer;-) >> My code looks like this: >> >> typedef struct CPU_float3{ >> float x,y,z; >> } float3; >> float vec_dot(float3 *a,float3 *b){ >> return a->x*b->x+a->y*b->y+a->z*b->z; >> } >> float pinner(float3 *Pd,float3 *Pm,float3 *Ad,float3 *Am){ >> return vec_dot(Pd,Am)+vec_dot(Pm,Ad); >> } >> ... >> >> and then I call pinner() a lot in my main function. >> >> Here are my questions: >> >> 1. when I compile the above code with gcc -O3 option, will the >> above vec_dot function be translated to SSE automatically? > I think: in general not. The vectorizer does only vectorize loops. > And in addition, you will have to add "-ffast-math" to the compiler, to > authorize vectorization (I think?). When you compile your code with the > option "-ftree-vectorizer-verbose=2": > > gcc-4.5 -O3 -ffast-math -ftree-vectorizer-verbose=2 -c sse.c > > it tells you about what the vectorizer is doing: nothing... (I simply > compiled the two functions vec_dot and pinner from you) > > However, if you would write vec_dot as > float vec_dot(float3 *a,float3 *b){ > float dot=0; > int i; > for(i = 0; i < 3; ++i) > dot+= a->x[i]*b->x[i]; > return dot; > } > , gcc would vectorize it, however not for a loop with only 3 iterations: > sse.c:7: note: not vectorized: iteration count too small. > sse.c:4: note: vectorized 0 loops in function. > > > However, as soon as you call vec_dot and pinner often on adjacent > elements, it might be that the vectorizer will be used therefor... Just > try to compile your code with "-ftree-vectorizer-verbose=2" (and maybe > "-ffast-math", if you can accept that loose of precision / weakening of > the standard (see man-page)) >> >> 2. if not, anyone can suggest a SSE instruction >> to accelerate the above computation? >> >> 3. is "inline" a valid option for GCC when compiling a C code? > Yes, it is. However, as soon as the function is defined in the same > compilation unit where it is used, gcc with -O3 will automatically > inline everything (at least: when gcc believes it to be usefull :-)) > > Axel >