I have a code snippet that I'm wondering why GCC didn't optimize the way I think it should: https://godbolt.org/z/1qKvax bar2() is a variant of bar1() that has been manually tweaked to avoid branches. I haven't done any benchmarks but, I would expect the branchless bar2() to perform better than bar1() but GCC does not automatically optimize bar1() to be like bar2(); the generated code for bar1() and bar2() are different and the generated code for bar1() contains a branch. I'm generally trying to get an idea of how smart GCC optimization is and how much hand-holding I should provide, so could someone help me understand why GCC didn't generate the same branchless code for bar1() and bar2()? Or, perhaps avoiding branches here doesn't actually help performance? Josh *SOURCE* char const* foo(); int cursor = 0; char const* bar1() { char const* result = foo(); if (result) ++cursor; return result; } char const* bar2() { char const* result = foo(); cursor += !!result; return result; } *GENERATED CODE* bar1(): sub rsp, 8 call foo() test rax, rax je .L1 add DWORD PTR cursor[rip], 1 .L1: add rsp, 8 ret bar2(): sub rsp, 8 call foo() cmp rax, 1 sbb DWORD PTR cursor[rip], -1 add rsp, 8 ret cursor: .zero 4