Re: enabling SSE for 3-vector inner product

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



hi Marc

On 04/30/2010 06:31 AM, Marc Glisse wrote:
On Thu, 29 Apr 2010, Qianqian Fang wrote:

Shouldn't there be some magic here for alignment purposes?

thank you for pointing this out. I changed the definition to

typedef struct CPU_float4{
    float x,y,z,w;
} float4 __attribute__ ((aligned(16)));

but the run-time using SSE3 remains the same.
Is my above change correct?


now I am trying to use SSE4.x DPPS, but gcc gave me
error. I don't know if I used it with a wrong format.

Did you try using the intrinsic _mm_dp_ps?

yes, I removed the asm and use mm_dp_ps, it works now.
the code now looks like this:

inline float vec_dot(float3 *a,float3 *b){
        float dot;
        __m128 na,nb,res;
        na=_mm_loadu_ps((float*)a);
        nb=_mm_loadu_ps((float*)b);
        res=_mm_dp_ps(na,nb,0x7f);
        _mm_store_ss(&dot,res);
        return dot;
}

sadly, using SSE4 only gave me a few percent (2~5%)
speed-up over the original C code. My profiling result
indicated the inner product took about 30% of my total
run time. Does this speedup make sense?

               "dpps %%xmm0, %%xmm1, 0xF1 \n\t"

Maybe the order of the arguments is reversed in asm and it likes a $ before a constant (and it prefers fewer parentheses on the next line).


with gcc -S, I can see that the assembly is in fact
dpps 127, xmm1, xmm0, so perhaps it was reversed
in my previous version.


In any case, you shouldn't get a factor 2 compared to the SSE3 version, so that won't be enough for you.

well, as I mentioned earlier, using SSE3 made my code 2.5x slower, not faster.
SSE4 is now 2~5% faster, but still not as significant as I thought.
I guess that's probably the best I can do with it. Right?

thanks

Qianqian

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux