On 05/19/2011 06:47 PM, Diego Caballero wrote: > Hello, > > I get 10% lower performance due to an assemble instruction that seems > to be useless. > I have a lot of blocks like the following (exactly the same) in my code: > > callq 400a50 <expf@plt> > movaps %xmm0,%xmm4 > movss (%r15),%xmm0 > movss %xmm4,0x40(%rsp) > callq 400a50 <expf@plt> > ... > > and at the end of them: > > movss 0x40(%rsp),%xmm4 > > %xmm4 is not read in the middle, so I don't understand why gcc generates > the second instruction, instead of something like that: > > movss %xmm0,0x40(%rsp) > movss (%r15),%xmm0 > callq 400a50 <expf@plt> > ... > movss 0x40(%rsp),%xmm4 > > Thank you in advance. Hi Diego, Can you make a small test case for us to try? It should be the smallest program that demonstrates the problem. Andrew.