Hi Qianqian, > First: I don't know anything about the vectorizer, so be very careful with my answer;-) > My code looks like this: > > typedef struct CPU_float3{ > float x,y,z; > } float3; > float vec_dot(float3 *a,float3 *b){ > return a->x*b->x+a->y*b->y+a->z*b->z; > } > float pinner(float3 *Pd,float3 *Pm,float3 *Ad,float3 *Am){ > return vec_dot(Pd,Am)+vec_dot(Pm,Ad); > } > ... > > and then I call pinner() a lot in my main function. > > Here are my questions: > > 1. when I compile the above code with gcc -O3 option, will the > above vec_dot function be translated to SSE automatically? I think: in general not. The vectorizer does only vectorize loops. And in addition, you will have to add "-ffast-math" to the compiler, to authorize vectorization (I think?). When you compile your code with the option "-ftree-vectorizer-verbose=2": gcc-4.5 -O3 -ffast-math -ftree-vectorizer-verbose=2 -c sse.c it tells you about what the vectorizer is doing: nothing... (I simply compiled the two functions vec_dot and pinner from you) However, if you would write vec_dot as float vec_dot(float3 *a,float3 *b){ float dot=0; int i; for(i = 0; i < 3; ++i) dot+= a->x[i]*b->x[i]; return dot; } , gcc would vectorize it, however not for a loop with only 3 iterations: sse.c:7: note: not vectorized: iteration count too small. sse.c:4: note: vectorized 0 loops in function. However, as soon as you call vec_dot and pinner often on adjacent elements, it might be that the vectorizer will be used therefor... Just try to compile your code with "-ftree-vectorizer-verbose=2" (and maybe "-ffast-math", if you can accept that loose of precision / weakening of the standard (see man-page)) > > 2. if not, anyone can suggest a SSE instruction > to accelerate the above computation? > > 3. is "inline" a valid option for GCC when compiling a C code? Yes, it is. However, as soon as the function is defined in the same compilation unit where it is used, gcc with -O3 will automatically inline everything (at least: when gcc believes it to be usefull :-)) Axel