On Tue, Jun 09, 2015 at 05:18:17PM +0530, Vineet Gupta wrote: > When auditing cmpxchg call sites, Chuck noted that gcc was optimizing > away some of the desired LDs. > > | do { > | new = old = *ipi_data_ptr; > | new |= 1U << msg; > | } while (cmpxchg(ipi_data_ptr, old, new) != old); > > was generating to below > > | 8015cef8: ld r2,[r4,0] <-- First LD > | 8015cefc: bset r1,r2,r1 > | > | 8015cf00: llock r3,[r4] <-- atomic op > | 8015cf04: brne r3,r2,8015cf10 > | 8015cf08: scond r1,[r4] > | 8015cf0c: bnz 8015cf00 > | > | 8015cf10: brne r3,r2,8015cf00 <-- Branch doesn't go to orig LD > > Although this was fixed by adding a ACCESS_ONCE in this call site, it > seems safer (for now at least) to add compiler barrier to LLSC based > cmpxchg This is required even. cmpxchg() should include a full memory barrier _before_ and _after_ the op. Both imply a compiler barrier. Acked-by: Peter Zijlstra (Intel) <peterz@xxxxxxxxxxxxx> -- To unsubscribe from this list: send the line "unsubscribe linux-arch" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html