On Mon, Aug 23, 2010 at 01:54:24AM +0100, Maciej W. Rozycki wrote: > > By rewriting the loop around all simple LL/SC blocks to C we reduce reduce > > the amount of inline assembler and at the same time allow GCC to often > > fill the branch delay slots with something sensible or whever else clever > > optimization it may have up in its sleeve. > > Are you sure it won't reorder anything there that actually relies on the > atomic access to have succeeded? I suggest adding barrier() after the > loop. None of the things that were touched by the code had any barrier functionality Some of the functions such as atomic_add don't provide memory barriers but where needed a barrier was always provided by C code near the end of the function, for example in atomic_add_return. Ralf