On Fri, Feb 25, 2022 at 04:05:28AM +0800, Xi Ruoyao via Gcc-help wrote: > On Thu, 2022-02-24 at 11:35 -0800, Satish Vasudeva wrote: > > Thanks for the response. > > > > Looking further into libatomic library code, I do see 16B move > > instructions have been used for atomic_exchange code like below. Just > > wondering why it is not generating a intrinsic __atomic_load_16 using > > this instruction. > > > > movdqa0x0(%rbp),%xmm0 > > Because both Intel and AMD have not claimed "this is atomic". In > __atomic_exchange movdqa is used as a normal data move instruction > (actually, GCC optimized memcpy calls in libatomic code to this). Yup. Even on cores where this is atomic internally it is not atomic when used on a system with a 64-bit (or 72-bit) memory bus. Segher