Hi Jochen, > > according to the manual, for the lower 8 Byte of the 16 byte arm simd > register c is range 0..7, > > for the complete 16 byte of arm simd register c is range 0..15; > Yes, and the point is that index #0 is useless on an x1_t type. See your own example below. > example: > > #include <stdio.h> > #include <arm_neon.h> > > uint64x1_t ext(uint64x1_t a, uint64x1_t b, int c) { > uint64x1_t result; > switch(c) { > case 0: asm("ext %0.8B, %1.8B, %2.8B, #0" : "=w" (result) : "w" > (a), "w" (b)); break; > case 1: asm("ext %0.8B, %1.8B, %2.8B, #1" : "=w" (result) : "w" > (a), "w" (b)); break; > case 2: asm("ext %0.8B, %1.8B, %2.8B, #2" : "=w" (result) : "w" > (a), "w" (b)); break; > case 3: asm("ext %0.8B, %1.8B, %2.8B, #3" : "=w" (result) : "w" > (a), "w" (b)); break; > case 4: asm("ext %0.8B, %1.8B, %2.8B, #4" : "=w" (result) : "w" > (a), "w" (b)); break; > case 5: asm("ext %0.8B, %1.8B, %2.8B, #5" : "=w" (result) : "w" > (a), "w" (b)); break; > case 6: asm("ext %0.8B, %1.8B, %2.8B, #6" : "=w" (result) : "w" > (a), "w" (b)); break; > case 7: asm("ext %0.8B, %1.8B, %2.8B, #7" : "=w" (result) : "w" > (a), "w" (b)); break; > } > return result; > } > > int main(int argc, char **argv) { > uint64x1_t a, b, result; > a[0]=0x0011223344556677; > b[0]=0x8899aabbccddeeff; > for(int c=0; c<8; c++) { > result=ext(a, b, c); > printf("%d %016lx\n", c, result[0]); > } > return 0; > } > > output: > > 0 0011223344556677 For index 0 you have the same number back as was in a[0]. There is no point in the compiler emitting an instruction to get the same number back that it had as the input. Regards, Tamar > 1 ff00112233445566 > 2 eeff001122334455 > 3 ddeeff0011223344 > 4 ccddeeff00112233 > 5 bbccddeeff001122 > 6 aabbccddeeff0011 > 7 99aabbccddeeff00 > > Kind regards, Jochen > > Am 09.09.20 um 11:17 schrieb Tamar Christina: > > Hi Jochen, > > > > EXT is a byte level extract, if you have a 64 bit vector and a 64-bit > > type like uint64x1_t then the only possible index for n is 0. > > > > While the compiler could have emitted > > > > ext v0.8b, v0.8b, v1.8b, #0 > > > > this is pointless as this essentially means to return v0. > > > > As such the compiler just uses return __a; as there's no point in emitting an > instruction. > > > > Regards, > > Tamar > > > >> -----Original Message----- > >> From: Gcc-help <gcc-help-bounces@xxxxxxxxxxx> On Behalf Of Jochen > >> Barth via Gcc-help > >> Sent: Wednesday, September 2, 2020 10:10 PM > >> To: gcc-help@xxxxxxxxxxx > >> Subject: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b, > >> __const int __c) > >> > >> Dear reader, > >> > >> the definition of aarch64/arm_neon.h (gcc 10.2) is > >> > >> __extension__ extern __inline uint64x1_t __attribute__ > >> ((__always_inline__, __gnu_inline__, __artificial__)) > >> vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c) { > >> __AARCH64_LANE_CHECK (__a, __c); > >> /* The only possible index to the assembler instruction returns > >> element 0. */ > >> return __a; > >> } > >> > >> So this function does essentially »return __a«. > >> > >> If the function name »vext_...« has, as the name suggests, something > >> to do with the »ext« neon simd instruction, > >> > >> then I do not understand where the asm-equivalent »ext« neon > >> instrinct is, because in the »Arm Architecture Reference Manual«, > >> chapter C7.2.543 > >> states: »<index> Is the lowest numbered byte element to be > >> extracted...«, ranging from 0..7 for Q=8 and 0..15 for Q=16 > >> (extraction over the whole 128 bit register). > >> > >> PS: gcc with vector expressions does not (?) use »ext« for > >> y=(x<<(c*8)) > >> | (x>>(64-c*8)); // for Q=8 > >> > >> Kind regards, Jochen > > IMPORTANT NOTICE: The contents of this email and any attachments are > confidential and may also be privileged. If you are not the intended recipient, > please notify the sender immediately and do not disclose the contents to any > other person, use it for any purpose, or store or copy the information in any > medium. Thank you.