Aarch64 / simd / ld1r question

Steve Ellcey <sellcey@xxxxxxxxxx> · Tue, 24 Apr 2018 08:33:40 -0700

I have a aarch64/simd question.  I want to load a 64 bit floating point
value into both halves of a 128 bit simd register.  I think that ld1r
will do that but I was wondering if there is a way of getting GCC to
generate an ld1r instruction with regular C syntax or with a __builtin
instead of having to use inline assembly.

I have tried:

__Float64x2_t foo1(void)
{
	__Float64x2_t a = (__Float64x2_t) { x, x};
	return a;
}

But that generates a 64 bit load and a dup instruction.

I tried:

__Float64x2_t foo2(void)
{
	__Float64x2_t a = __builtin_aarch64_ld1v2df
				((const __builtin_aarch64_simd_df *) &x);
	return a;
}

But that generated a 64 bit ldr, not a ldr1 and so only one 64 bit value got
put in the 128 bit vector register.  Is there a different builtin that would
generate an ld1r?  I see the '*aarch64_simd_ld1r<mode>' instruction in
aarch64-simd.md file but I am not sure there is any way for me to generate
that.

Basically, I am trying to find the most efficient way to get two identical
64 bit constant values into the upper and lower halves of a 128 bit simd
register.  This is something I am looking at using in a vector sin/cos
routine for aarch64.

Steve Ellcey
sellcey@xxxxxxxxxx