Re: Using bt,bts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Sep 28, 2012 at 8:40 AM, Ondřej Bílka <neleai@xxxxxxxxx> wrote:
> On Thu, Sep 27, 2012 at 10:52:48AM -0700, Ian Lance Taylor wrote:
>> On Thu, Sep 27, 2012 at 12:35 AM, Ondřej Bílka <neleai@xxxxxxxxx> wrote:
>> > On Wed, Sep 26, 2012 at 04:20:52PM -0700, Ian Lance Taylor wrote:
>> >> On Wed, Sep 26, 2012 at 10:34 AM, Ondřej Bílka <neleai@xxxxxxxxx> wrote:
>> >>
>> >> > is there a reason why for example
>> >> > x=x|(1<<11);
>> >> > is not expanded into
>> >> > bts rax,11
>> >> > ?
>> >>
>> >> The bts instruction is never faster than the corresponding or
>> >> instruction.  There's no reason to use it when setting a bit in the
>> >> low 32 bits.
>> >>
>> >> Ian
>> > Following benchmarks tells otherwise. On ivy bridge bts variant is twice
>> > faster than doing or.
>> >
>> > I used
>> >
>> >  for(i=0;i<1000000;i++)
>> >     x=x|(1<<i);
>>
>> That is a rather odd benchmark.  Almost all of the loop iterations
>> will do nothing because the 1 will be left shifted into nothingness.

> From intel reference manual:

Sure, I know.  But I don't see why it is relevant.  This is C.  If you
want to test machine instructions, write assembly code.

>> And if you look back at what I said, I said they were equivalent when
>> setting one of the low order 32 bits, which is what was happening in
>> your original code.
> I did not say that i set lower 32 bits nor did I say that position I set
> is constant.

Well, I tried to answer the question you posed.  You now seem to be
asking a different question.  Perhaps it has a different answer.  But
I'm not sure exactly what question you are asking.

>> Those loops are not equivalent even apart from bts vs. ori.  One has
>> four instructions, the other has six.
> Two functions are equivalent if and only if for every input they produce
> same output. That one consist of 10 instructions while other 8 is
> irrelevant.

I thought the point of your example was a micro-benchmark to show that
bts is faster than ori.  For a micro-benchmark of a single
instruction, it's highly relevant whether other instructions are being
executed.  I apologize if I misunderstood the point of your test case.

Ian



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux