Dzonatas wrote:
Hello,
Please, help me figure out how to trigger the scatter/gather
(pack/unpack) vector operations in gcc.
I need to move data from four pointer locations to a vector and vice
versa with no extra in-line asm(). Currently, I have code that looks
like this:
typedef union
{
float f[4] ;
v4sf v ;
} vector[100] ;
for( i=100; --i>=0;) {
row1[i] = vector[i].f[0];
row2[i] = vector[i].f[1];
row3[i] = vector[i].f[2];
row4[i] = vector[i].f[3];
}
Is this the only portable way to do to a pack/unpack without asm()?
How do I set it up differently to trigger a pack/unpack optimization?
Thank you.
If you're talking about optimization for a specific CPU, but you don't
want to reveal which CPU that is, why even post this? This code looks
OK to me. There isn't any special hardware support for this on commonly
available CPUs, like Opteron or Xeon. Scalar moves should work as well
as anything, and you are within the limits for efficient Write Combine
buffering. If you have problems, you won't get any help if you can't
describe them more specifically.