On Thu, 2022-02-24 at 11:35 -0800, Satish Vasudeva wrote: > Thanks for the response. > > Looking further into libatomic library code, I do see 16B move > instructions have been used for atomic_exchange code like below. Just > wondering why it is not generating a intrinsic __atomic_load_16 using > this instruction. > > movdqa0x0(%rbp),%xmm0 Because both Intel and AMD have not claimed "this is atomic". In __atomic_exchange movdqa is used as a normal data move instruction (actually, GCC optimized memcpy calls in libatomic code to this). -- Xi Ruoyao <xry111@xxxxxxxxxxxxxxxx> School of Aerospace Science and Technology, Xidian University