Machine constraint for ARM vector extract?

Jeffrey Walton <noloader@xxxxxxxxx> · Fri, 20 Jan 2017 15:16:29 -0500

Hi Everyone,

I have an inline function to to help with a vector extract. It a
little more direct, and it avoids most of the conversions required for
intrinsics:

inline uint64x2_t VEXT_8(uint64x2_t a, uint64x2_t b, unsigned int c)
{
    uint64x2_t r;
    __asm __volatile("ext   %0.16b, %1.16b, %2.16b, %3 \n\t"
        :"=w" (r) : "w" (a), "w" (b), "I" (c) );
    return r;
}

The compile is failing under Debug builds when no optimizations are used:

   /opt/cfarm/gcc-latest/bin/g++ -g3 -O0 -march=armv8-a+crc+crypto
-D_GLIBCXX_DEBUG -c gcm.cpp
   gcm.cpp: In function 'uint64x2_t VEXT_8(uint64x2_t, uint64x2_t,
unsigned int)':
   gcm.cpp:90:48: warning: asm operand 3 probably doesn't match constraints
            :"=w" (r) : "w" (a), "w" (b), "I" (c) );
                                                ^
   gcm.cpp:90:48: error: impossible constraint in 'asm'

The function is being used like:

    uint64x2_t c0, c1, c2;
    ...
    c2 = veorq_u64(c0, VEXT_8(vdupq_n_u64(0), c1, 8));

The compile for Release builds are fine. Release builds use -g2 and -O3.

I'm trying to avoid a template parameter for 'c'. It seems like it
should work since the intrinsic works in Debug builds.

My question are, is it possible to do this without a template
parameter? If so, what machine constraint should we use for 'c'?

Thanks in advance