X86 Built-in Functions

Jay Groven <grovenjl@xxxxxxxxxxxxxxx> · Mon, 2 Feb 2004 17:10:16 -0500

All of the SSE functions listed in the gcc manual return a value, i.e. we have

v4sf __builtin_ia32_mulps (v4sf, v4sf)

which returns the component-wise product of the two given vectors.  However, 
the actual sse instruction mulps is accumulator-based.  This seems to make 
gcc use quite a few temp registers when you call mulps, since it's trying to 
give a return value for an instruction that doesn't work that way.  Is there 
any gcc builtin command set that uses accumulation, rather than returning 
values?  That would really be nice, since that's how the instructions 
actually work, and that's really how I want to use them.  Thanks for any 
feedback.

PS, please reply-to-all, since I'm not subscribed to this list.