Hello, I get 10% lower performance due to an assemble instruction that seems to be useless. I have a lot of blocks like the following (exactly the same) in my code: callq 400a50 <expf@plt> movaps %xmm0,%xmm4 movss (%r15),%xmm0 movss %xmm4,0x40(%rsp) callq 400a50 <expf@plt> ... and at the end of them: movss 0x40(%rsp),%xmm4 %xmm4 is not read in the middle, so I don't understand why gcc generates the second instruction, instead of something like that: movss %xmm0,0x40(%rsp) movss (%r15),%xmm0 callq 400a50 <expf@plt> ... movss 0x40(%rsp),%xmm4 Thank you in advance. Best regards. -- Diego Caballero