Hi Steve, On 24/04/18 16:33, Steve Ellcey wrote:
I have a aarch64/simd question. I want to load a 64 bit floating point value into both halves of a 128 bit simd register. I think that ld1r will do that but I was wondering if there is a way of getting GCC to generate an ld1r instruction with regular C syntax or with a __builtin instead of having to use inline assembly. I have tried: __Float64x2_t foo1(void) { __Float64x2_t a = (__Float64x2_t) { x, x}; return a; }
Hmm, do you have any patches in your tree that affect this part of GCC? For me the code: __Float64x2_t foo1(_Float64 *x) { __Float64x2_t a = (__Float64x2_t) { *x, *x}; return a; } generates with current trunk at -O2: foo1: ld1r {v0.2d}, [x0] ret That is the *aarch64_simd_ld1rv2df pattern which is a vec_duplicate of a MEM. Thanks, Kyrill
But that generates a 64 bit load and a dup instruction. I tried: __Float64x2_t foo2(void) { __Float64x2_t a = __builtin_aarch64_ld1v2df ((const __builtin_aarch64_simd_df *) &x); return a; } But that generated a 64 bit ldr, not a ldr1 and so only one 64 bit value got put in the 128 bit vector register. Is there a different builtin that would generate an ld1r? I see the '*aarch64_simd_ld1r<mode>' instruction in aarch64-simd.md file but I am not sure there is any way for me to generate that. Basically, I am trying to find the most efficient way to get two identical 64 bit constant values into the upper and lower halves of a 128 bit simd register. This is something I am looking at using in a vector sin/cos routine for aarch64. Steve Ellcey sellcey@xxxxxxxxxx