enabling SSE for 3-vector inner product

Qianqian Fang <fangqq@xxxxxxxxx> · Thu, 29 Apr 2010 12:19:30 -0400

hi list

I am working on a computing code and realized that
a simple inner product of float triplets is taking
30% of my run time when compiling with GCC -O3.
I want to explore options to further accelerate
this code and came up with a couple of questions
concerning using SSE in GCC.

My code looks like this:

typedef struct CPU_float3{
    float x,y,z;
} float3;
...
float vec_dot(float3 *a,float3 *b){
        return a->x*b->x+a->y*b->y+a->z*b->z;
}
float pinner(float3 *Pd,float3 *Pm,float3 *Ad,float3 *Am){
        return vec_dot(Pd,Am)+vec_dot(Pm,Ad);
}
...

and then I call pinner() a lot in my main function.

Here are my questions:

1. when I compile the above code with gcc -O3 option, will the
above vec_dot function be translated to SSE automatically?

2. if not, anyone can suggest a SSE instruction
to accelerate the above computation?

3. is "inline" a valid option for GCC when compiling a C code?

any suggestions for improving the efficiency is
highly appreciated.

thanks

Qianqian