I can't get used to the lack of reply-to field for gcc-help. On Tue, Aug 24, 2010 at 00:33, Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> wrote: >> inline uint64x64_t &operator += (uint64x64_t &lhs, const uint64x64_t &rhs) >> { >> lhs._v.hi += rhs._v.hi; >> lhs._v.lo += rhs._v.lo; >> if (lhs._v.hi < rhs._v.lo) > > if (lhs._v.lo < rhs._v.lo) gah :/ > >> { >> lhs._v.hi++; >> } >> return lhs; >> } > > Does the generated code look any better with that correction? If not, you > want to tell us the exact command line and GCC version you used. [mlacage@diese simulator]$ gcc --version gcc (GCC) 4.3.2 20081105 (Red Hat 4.3.2-7) Yes, it looks much better, I get: 804b850: 8b 55 e4 mov -0x1c(%ebp),%edx 804b853: 8b 7d 08 mov 0x8(%ebp),%edi 804b856: 8b 44 0a 08 mov 0x8(%edx,%ecx,1),%eax 804b85a: 8b 54 0a 0c mov 0xc(%edx,%ecx,1),%edx 804b85e: 01 47 08 add %eax,0x8(%edi) 804b861: 8b 45 e4 mov -0x1c(%ebp),%eax 804b864: 11 57 0c adc %edx,0xc(%edi) 804b867: 03 c 08 add (%eax,%ecx,1),%ebx 804b86a: 13 74 08 04 adc 0x4(%eax,%ecx,1),%esi 804b86e: 89 f mov %ebx,(%edi) 804b870: 89 77 04 mov %esi,0x4(%edi) 804b873: 3b 74 08 04 cmp 0x4(%eax,%ecx,1),%esi 804b877: 77 12 ja 804b88b <ns3::uint64x64_t run_add<ns3::uint64x64_t>(ns3::uint64x64_t, ns3::uint64x64_t, long)+0xfb> But the above is not as good as the following quick hack that is most likely not correct but that should be close to the minimal code: asm ("mov 0(%0),%%eax\n\t" "add 0(%1),%%eax\n\t" "mov %%eax,0(%0)\n\t" "mov 4(%0),%%eax\n\t" "adc 4(%1),%%eax\n\t" "mov %%eax,4(%0)\n\t" "mov 8(%0),%%eax\n\t" "adc 8(%1),%%eax\n\t" "mov %%eax,8(%0)\n\t" "mov 12(%0),%%eax\n\t" "adc 12(%1),%%eax\n\t" "mov %%eax,12(%0)\n\t" : : "r" (&lhs._v), "r" (&rhs._v) : "%eax", "cc"); I get around 3.5ns for the handcoded assembly while I get 5.1ns for the compiler-generated one. Mathieu -- Mathieu Lacage <mathieu.lacage@xxxxxxxxx> -- Mathieu Lacage <mathieu.lacage@xxxxxxxxx>