Re: Scatter/Gather vector operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



typedef union
{
float    f[4] ;
v4sf    v ;
} vector[100] ;

for( i=100; --i>=0;) {
row1[i] = vector[i].f[0];
row2[i] = vector[i].f[1];
row3[i] = vector[i].f[2];
row4[i] = vector[i].f[3];
           }

Unroll by four: load four vectors, swap data around in
registers, store four vectors.

Is this the only portable way to do to a pack/unpack without asm()?

If you want fully portable at the C level without using any
conditionals, this is pretty much it.  If you just don't want
to use asm(), there are intrinsics you can use.

How do I set it up differently to trigger a pack/unpack optimization?

Perhaps the auto-vectorisers aren't smart enough (yet) to
do this for you.  If your goal is great performance, you
really have to write a special version for every processor;
although auto-vectorisation certainly can speed up things
quite a bit, hand-written vector code can be *much* faster.
A big part of the problem is that many vector insn sets are
very limited, or just "different".


Segher


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux