On Fri, Sep 28, 2012 at 8:40 AM, Ondřej Bílka <neleai@xxxxxxxxx> wrote: > On Thu, Sep 27, 2012 at 10:52:48AM -0700, Ian Lance Taylor wrote: >> On Thu, Sep 27, 2012 at 12:35 AM, Ondřej Bílka <neleai@xxxxxxxxx> wrote: >> > On Wed, Sep 26, 2012 at 04:20:52PM -0700, Ian Lance Taylor wrote: >> >> On Wed, Sep 26, 2012 at 10:34 AM, Ondřej Bílka <neleai@xxxxxxxxx> wrote: >> >> >> >> > is there a reason why for example >> >> > x=x|(1<<11); >> >> > is not expanded into >> >> > bts rax,11 >> >> > ? >> >> >> >> The bts instruction is never faster than the corresponding or >> >> instruction. There's no reason to use it when setting a bit in the >> >> low 32 bits. >> >> >> >> Ian >> > Following benchmarks tells otherwise. On ivy bridge bts variant is twice >> > faster than doing or. >> > >> > I used >> > >> > for(i=0;i<1000000;i++) >> > x=x|(1<<i); >> >> That is a rather odd benchmark. Almost all of the loop iterations >> will do nothing because the 1 will be left shifted into nothingness. > From intel reference manual: Sure, I know. But I don't see why it is relevant. This is C. If you want to test machine instructions, write assembly code. >> And if you look back at what I said, I said they were equivalent when >> setting one of the low order 32 bits, which is what was happening in >> your original code. > I did not say that i set lower 32 bits nor did I say that position I set > is constant. Well, I tried to answer the question you posed. You now seem to be asking a different question. Perhaps it has a different answer. But I'm not sure exactly what question you are asking. >> Those loops are not equivalent even apart from bts vs. ori. One has >> four instructions, the other has six. > Two functions are equivalent if and only if for every input they produce > same output. That one consist of 10 instructions while other 8 is > irrelevant. I thought the point of your example was a micro-benchmark to show that bts is faster than ori. For a micro-benchmark of a single instruction, it's highly relevant whether other instructions are being executed. I apologize if I misunderstood the point of your test case. Ian