On Wed, Sep 26, 2012 at 04:20:52PM -0700, Ian Lance Taylor wrote: > On Wed, Sep 26, 2012 at 10:34 AM, Ondřej Bílka <neleai@xxxxxxxxx> wrote: > > > is there a reason why for example > > x=x|(1<<11); > > is not expanded into > > bts rax,11 > > ? > > The bts instruction is never faster than the corresponding or > instruction. There's no reason to use it when setting a bit in the > low 32 bits. > > Ian Following benchmarks tells otherwise. On ivy bridge bts variant is twice faster than doing or. I used for(i=0;i<1000000;i++) x=x|(1<<i); implemented as .globl main .type main, @function main: .LFB0: .cfi_startproc xorl %eax, %eax xorl %ecx, %ecx movl $1, %edx .p2align 4,,10 .p2align 3 .L2: bts %ecx, %edx addl $1, %ecx cmpl $100000000, %ecx jne .L2 rep ret .cfi_endproc and .globl main .type main, @function main: .LFB0: .cfi_startproc xorl %eax, %eax xorl %ecx, %ecx movl $1, %edx .p2align 4,,10 .p2align 3 .L2: movl %edx, %esi sall %cl, %esi addl $1, %ecx orl %esi, %eax cmpl $100000000, %ecx jne .L2 rep ret .cfi_endproc