On Tue, 12 Jan 2021, ☂Josh Chia (謝任中) via Gcc-help wrote: > I didn't mention it earlier, but I'd like to clarify that this is for "g++ > -O3 -std=c++17" on GCC 10.2. > > If I use "static thread_local int cursor = 0;" then cursor is thread-local > and such a race should be impossible, but the generated code for bar1() > still has a branch. Whether 'cursor' is thread-local or not does not matter, it's possible to take the address of a thread-local variable and pass it to another thread. > The branch persists also if I do this instead for bar1(): > char const* bar1() { > char const* result = foo(); > if (result) > cursor += 1; > else > cursor += 0; > return result; > } This is hard to optimize as 'cursor += 0' is optimized out quite early, and it's hard to keep track that 'cursor' is unconditionally written in the abstract machine. Alexander