Re: Scatter/Gather vector operations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tim Prince wrote:
Dzonatas wrote:
Is this the only portable way to do to a pack/unpack without asm()? How do I set it up differently to trigger a pack/unpack optimization?

Thank you.

If you're talking about optimization for a specific CPU, but you don't want to reveal which CPU that is, why even post this?
No. I'm just trying to get an idea of what direction the future of such code may take, as I also wonder what is the best format for now.
This code looks OK to me. There isn't any special hardware support for this on commonly available CPUs, like Opteron or Xeon. Scalar moves should work as well as anything, and you are within the limits for efficient Write Combine buffering. If you have problems, you won't get any help if you can't describe them more specifically.

The problem is bandwidth. Vector processes help greatly with that alone despite the matrix math.

Currently, there are immediate targets for SSE2 and Altivec enabled architectures. I could probably write assembly code to overcome it with instructions to unpack a vector and scatter the data that is specific for SSE2/Altivec, but I don't want to aim that short. I would like to avoid the assembly code if possible.

For example, is there a formal way to use a vector register as a pointer to main memory to fetch that data into another vector register. I know this is beyond the basic vector operations implemented now, but like:

for(i=0;i<100;i++) {
*vector_reg1 = vector_reg2; // scatter each data element from reg2 into memory pointed to by each associated element in reg1
vector_reg1++;
}

Thanks for the response.


--

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux