Re: Scatter/Gather vector operations

Dzonatas <dzonatas@xxxxxxxxxx> · Sun, 08 Apr 2007 01:02:53 -0700

Tim Prince wrote:
Dzonatas wrote:
Is this the only portable way to do to a pack/unpack without asm()? 
How do I set it up differently to trigger a pack/unpack optimization?

Thank you.

If you're talking about optimization for a specific CPU, but you don't 
want to reveal which CPU that is, why even post this?
No. I'm just trying to get an idea of what direction the future of such 
code may take, as I also wonder what is the best format for now.
This code looks OK to me.  There isn't any special hardware support 
for this on commonly available CPUs, like Opteron or Xeon. Scalar 
moves should work as well as anything, and you are within the limits 
for efficient Write Combine buffering.  If you have problems, you 
won't get any help if you can't describe them more specifically.

The problem is bandwidth. Vector processes help greatly with that alone 
despite the matrix math.

Currently, there are immediate targets for SSE2 and Altivec enabled 
architectures. I could probably write assembly code to overcome it with 
instructions to unpack a vector and scatter the data that is specific 
for SSE2/Altivec, but I don't want to aim that short. I would like to 
avoid the assembly code if possible.

For example, is there a formal way to use a vector register as a pointer 
to main memory to fetch that data into another vector register. I know 
this is beyond the basic vector operations implemented now, but like:

for(i=0;i<100;i++) {
*vector_reg1 = vector_reg2;   // scatter each data element from reg2 
into memory pointed to by each associated element in reg1
vector_reg1++;
}

Thanks for the response.

--