Re: Failure to optimize?

Jonathan Wakely via Gcc-help <gcc-help@xxxxxxxxxxx> · Tue, 12 Jan 2021 14:20:35 +0000

On Tue, 12 Jan 2021 at 13:37, Florian Weimer via Gcc-help
<gcc-help@xxxxxxxxxxx> wrote:
>
> * ☂Josh Chia (謝任中) via Gcc-help:
>
> > I have a code snippet that I'm wondering why GCC didn't optimize the way I
> > think it should:
> > https://godbolt.org/z/1qKvax
> >
> > bar2() is a variant of bar1() that has been manually tweaked to avoid
> > branches. I haven't done any benchmarks but, I would expect the branchless
> > bar2() to perform better than bar1() but GCC does not automatically
> > optimize bar1() to be like bar2(); the generated code for bar1() and bar2()
> > are different and the generated code for bar1() contains a branch.
>
> The optimization is probably valid for C99, but not for C11, where the
> memory model prevents the compiler from introducing spurious writes:
> Another thread may modify the variable concurrently, and if this happens
> only if foo returns NULL, the original bar1 function does not contain a
> data race, but the branchless version would.

I'm not sure about the rules for C, but in C++ the compiler can assume
there is no race, because the increment is not atomic. If there were
another access to the variable then a non-atomic store would be a race
even in the bar1 version.