NEON and instruct GCC to move a lane without using a regular register?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I'm having a heck of a time getting GCC to perform a lane to register
lane transfer among D registers.

I have the following C-code:

    #define set_high_from_high(d, m) \
        d=vsetq_lane_u64(vgetq_lane_u64(m,LANE_H64),d,LANE_H64);


    uint64x2_t x, m;
    ...

   set_high_from_high(x, m);

GCC is generating something like:

    mov v1.2d[0], x0
    mov x0, v2.2d[0]

Instead of:

    mov v1.2d[0], v2.2d[0]

I've abandoned inline functions in favor of defines. I've also tried
with and without the 'd=' in the define.

How do I instruct GCC to perform the NEON to NEON lane transfer?

*****

I know it can be done because Clang is doing it. GCC is lagging behind
Clang by about 4 cycles per byte. Here's some relative counts:

GCC at -O3
$ gdb -batch -ex 'disassemble BLAKE2_NEON_Compress64' ./blake2.o | wc -l
2021

Clang at -O3
$ gdb -batch -ex 'disassemble BLAKE2_NEON_Compress64' ./blake2.o | wc -l
445



[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux