Jay Groven <grovenjl@xxxxxxxxxxxxxxx> writes: > All of the SSE functions listed in the gcc manual return a value, i.e. we have > > v4sf __builtin_ia32_mulps (v4sf, v4sf) > > which returns the component-wise product of the two given vectors. However, > the actual sse instruction mulps is accumulator-based. This seems to make > gcc use quite a few temp registers when you call mulps, since it's trying to > give a return value for an instruction that doesn't work that way. The register allocator should do this for you. Do you have concrete code where this doesn't work? -- Falk