Hi Team, I was looking at the hotspots in our software stack and interestingly I see libat_load_16_i1 seems to be one of the top in the list. I am trying to understand why that is the case. My suspicion is some kind of lock usage for 16B atomic accesses. I came across this discussion but frankly I am still confused. https://gcc.gnu.org/legacy-ml/gcc-patches/2017-01/msg02344.html Do you think the overhead of libat_load_16_i1 is due to spinlock usage? Also reading some other Intel CPU docs, it seems like the CPU does support loading 16B in single access. In that case can we optimize this for performance? Thanks and appreciate your help. Satish