RE: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t b, const int __c)

Tamar Christina <Tamar.Christina@xxxxxxx> · Thu, 10 Sep 2020 08:34:19 +0000

Hi Jochen,

> -----Original Message-----
> From: Jochen Barth <jpunktbarth@xxxxxxxxx>
> Sent: Thursday, September 10, 2020 9:14 AM
> To: Tamar Christina <Tamar.Christina@xxxxxxx>
> Cc: gcc-help <gcc-help@xxxxxxxxxxx>
> Subject: Re: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b,
> __const int __c)
> 
> Dear Tamar,
> 
> Sorry, I do no get the point:
> 
> >>> EXT is a byte level extract, if you have a 64 bit vector and a
> >>> 64-bit type like uint64x1_t then the only possible index for n is 0.
> 

Because those intrinsics are not doing byte level extraction. They are convenience functions that
do not allow partial extraction of a type. For instance vext_s16 which takes an int16x4_t as input
restricts the values of n to 0 to 3 because when used with the EXT instruction it always
makes sure they're a multiple of 2 bytes since a int16 is two bytes.

A uint64x1_t is a vector of 8 bytes which the intrinsic does as a group of 8 bytes since it
Always wants to extract whole numbers. As such the only possible index is 0.

To get the behavior you have in your example you need to do the extraction on bytes using
vext_u8 which will allow you to corrupt the number. i.e.

what you want is

 vreinterpret_u64_u8 (vext_u8 (vreinterpret_u8_u64 (a), vreinterpret_u8_u64 (b), <number>))

where your extraction happens on bytes. In this case n has the range 0-7.

Instead of looking at the Arm ARM you should look at the definition of the intrinsics
https://developer.arm.com/architectures/instruction-sets/simd-isas/neon/intrinsics

Regards,
Tamar

> But my previous examples with n=c=1..7 showed that different (n=c)'s are
> possible,
> 
> why is "the only possible index for n=0" ?
> 
> Kind regards, Jochen

RE: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t __b, __const int __c)

RE: arm_neon.h / vext_u64 (uint64x1_t __a, uint64x1_t b, const int __c)