Re: Using bt,bts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Sep 27, 2012 at 12:35 AM, Ondřej Bílka <neleai@xxxxxxxxx> wrote:
> On Wed, Sep 26, 2012 at 04:20:52PM -0700, Ian Lance Taylor wrote:
>> On Wed, Sep 26, 2012 at 10:34 AM, Ondřej Bílka <neleai@xxxxxxxxx> wrote:
>>
>> > is there a reason why for example
>> > x=x|(1<<11);
>> > is not expanded into
>> > bts rax,11
>> > ?
>>
>> The bts instruction is never faster than the corresponding or
>> instruction.  There's no reason to use it when setting a bit in the
>> low 32 bits.
>>
>> Ian
> Following benchmarks tells otherwise. On ivy bridge bts variant is twice
> faster than doing or.
>
> I used
>
>  for(i=0;i<1000000;i++)
>     x=x|(1<<i);

That is a rather odd benchmark.  Almost all of the loop iterations
will do nothing because the 1 will be left shifted into nothingness.

And if you look back at what I said, I said they were equivalent when
setting one of the low order 32 bits, which is what was happening in
your original code.


> implemented as
>
> .globl main
>   .type main, @function
> main:
> .LFB0:
>   .cfi_startproc
>   xorl  %eax, %eax
>   xorl  %ecx, %ecx
>   movl  $1, %edx
>   .p2align 4,,10
>   .p2align 3
> .L2:
>   bts %ecx, %edx
>   addl  $1, %ecx
>   cmpl  $100000000, %ecx
>   jne .L2
>   rep
>   ret
> .cfi_endproc
>
> and
>
> .globl main
>   .type main, @function
> main:
> .LFB0:
>   .cfi_startproc
>   xorl  %eax, %eax
>   xorl  %ecx, %ecx
>   movl  $1, %edx
>   .p2align 4,,10
>   .p2align 3
> .L2:
>   movl  %edx, %esi
>   sall  %cl, %esi
>   addl  $1, %ecx
>   orl %esi, %eax
>   cmpl  $100000000, %ecx
>   jne .L2
>   rep
>   ret
> .cfi_endproc

Those loops are not equivalent even apart from bts vs. ori.  One has
four instructions, the other has six.

Ian



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux