Ian Lance Taylor wrote: > I don't think you said what the missed optimization was. You also > didn't mention which version of gcc you are using, or what target you > are compiling for. Testcase #2 shows how __builtin_memcmp is not moved outside the loop. This is what GCC 4.7.0, target i686-pc-linux-gnu, option -O3, generates: main: pushl %ebp movl _ZL2s1+12, %edx pushl %edi pushl %esi xorl %esi, %esi cmpl _ZL2s2+12, %edx pushl %ebx je .L11 .L2: popl %ebx movl %esi, %eax popl %esi popl %edi popl %ebp ret .L11: movl $18, %ebp movl $1, %ebx .L7: movl $_ZL2s1, %esi movl %edx, %ecx cmpl %edx, %edx movl $_ZL2s2, %edi repz cmpsb movl $0, %esi setb %cl seta %al subb %cl, %al movl %ebx, %ecx movsbl %al, %eax movzbl %cl, %ecx testl %eax, %eax cmovne %esi, %ebx testl %eax, %eax cmovne %esi, %ecx subl $1, %ebp jne .L7 movl %ecx, %esi jmp .L2 Now what happens without "str1.size == str2.size" in operator==: main: movl _ZL2s1+12, %ecx movl $18, %edx movl $1, %eax pushl %edi movl $_ZL2s2, %edi pushl %esi movl $_ZL2s1, %esi cmpl %ecx, %ecx repz cmpsb sete %cl movzbl %cl, %ecx .L2: andl %ecx, %eax subl $1, %edx jne .L2 movzbl %al, %eax popl %esi popl %edi ret The loop, however, is still here, which is also exemplified by testcase #1: main: movl _ZL1n, %ecx movl $18, %edx movl $1, %eax .L2: andl %ecx, %eax subl $1, %edx jne .L2 rep ret Tested versions 4.4.4 and later. Dmitry