Re: Libatomic 16B

Satish Vasudeva via Gcc-help <gcc-help@xxxxxxxxxxx> · Thu, 24 Feb 2022 11:35:14 -0800

Thanks for the response.

Looking further into libatomic library code, I do see 16B move instructions
have been used for atomic_exchange code like below. Just wondering why it
is not generating a intrinsic __atomic_load_16 using this instruction.

*movdq**a* 0x0(%*rbp*),%*xmm0*

On Thu, Feb 24, 2022 at 11:09 AM Xi Ruoyao <xry111@xxxxxxxxxxxxxxxx> wrote:

> On Wed, 2022-02-23 at 08:42 -0800, Satish Vasudeva via Gcc-help wrote:
> > Hi Team,
> >
> > I was looking at the hotspots in our software stack and interestingly I
> see
> > libat_load_16_i1 seems to be one of the top in the list.
> >
> > I am trying to understand why that is the case. My suspicion is some kind
> > of lock usage for 16B atomic accesses.
> >
> > I came across this discussion but frankly I am still confused.
> > https://gcc.gnu.org/legacy-ml/gcc-patches/2017-01/msg02344.html
> >
> > Do you think the overhead of libat_load_16_i1 is due to spinlock usage?
> > Also reading some other Intel CPU docs, it seems like the CPU does
> support
> > loading 16B in single access. In that case can we optimize this for
> > performance?
>
> Open a issue at https://gcc.gnu.org/bugzilla, with the reference to the
> Intel CPU doc prove that some specific models supports loading 128-bit.
>
> Don't use "it seems like", nobody wants to write some nasty SSE code and
> then find it doesn't work on any CPU.
> --
> Xi Ruoyao <xry111@xxxxxxxxxxxxxxxx>
> School of Aerospace Science and Technology, Xidian University
>