On Wed, 2022-02-23 at 08:42 -0800, Satish Vasudeva via Gcc-help wrote: > Hi Team, > > I was looking at the hotspots in our software stack and interestingly I see > libat_load_16_i1 seems to be one of the top in the list. > > I am trying to understand why that is the case. My suspicion is some kind > of lock usage for 16B atomic accesses. > > I came across this discussion but frankly I am still confused. > https://gcc.gnu.org/legacy-ml/gcc-patches/2017-01/msg02344.html > > Do you think the overhead of libat_load_16_i1 is due to spinlock usage? > Also reading some other Intel CPU docs, it seems like the CPU does support > loading 16B in single access. In that case can we optimize this for > performance? Open a issue at https://gcc.gnu.org/bugzilla, with the reference to the Intel CPU doc prove that some specific models supports loading 128-bit. Don't use "it seems like", nobody wants to write some nasty SSE code and then find it doesn't work on any CPU. -- Xi Ruoyao <xry111@xxxxxxxxxxxxxxxx> School of Aerospace Science and Technology, Xidian University