On Wed, 18 Aug 2010, Ralf Baechle wrote: > By rewriting the loop around all simple LL/SC blocks to C we reduce reduce > the amount of inline assembler and at the same time allow GCC to often > fill the branch delay slots with something sensible or whever else clever > optimization it may have up in its sleeve. Are you sure it won't reorder anything there that actually relies on the atomic access to have succeeded? I suggest adding barrier() after the loop. Maciej