Ankit,
here is the objdump of the .o and it is definitely a movq instruction.
765: 0f 6f c1 movq %mm1,%mm0 768: 0f 73 d0 20 psrlq $0x20,%mm0 76c: 0f fe c8 paddd %mm0,%mm1 76f: 0f 7f 4d e8 movq %mm1,0xffffffe8(%ebp) 773: 8b 45 e8 mov 0xffffffe8(%ebp),%eax
my compile options are -march=pentium4 -mfpmath=sse -msse2 -O3
I should have mentioned the version number I am on, so we can check if you are on a later version.
gcc (GCC) 3.4.1 (cygming special) Copyright (C) 2004 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Jim
hi
--- James HAUXWELL <james.hauxwell@xxxxxx> wrote:
Hi,
I have a piece of code using mmx/sse intrinsics
ta = __builtin_ia32_pmaddwd(ia, one); tb = (v2si)__builtin_ia32_psrlq((di)ta, 32); dest.__v = __builtin_ia32_paddd(ta, tb); satd = dest.__a[0];
At the phase where I move the bottom 32bits of the
mmx register to a normal register
well i have a doubt in this i.e whether this is really happening or not. because i am also usign gcc compiler only and movd instrction works. if your things work let me also know about it
thanks
I should be able to use a movd
instruction (according to intel documentation), but what ever I do I can't
generate one. It is currently generating a movq to a memory loacation
and then doing a shorter load from the same location.
should be something like.
ta = __builtin_ia32_pmaddwd(ia, one); tb = (v2si)__builtin_ia32_psrlq((di)ta, 32); satd = (int)__builtin_ia32_paddd(ta, tb);
Is anyone familiar enough with intrinsics to know why this doesn't work?
Jim
________________________________________________________________________
Yahoo! Messenger - Communicate instantly..."Ping" your friends today! Download Messenger Now http://uk.messenger.yahoo.com/download/index.html