Hi Brian,
>Could the problem be that a Camera class cannot be allocated on the heap in such a way that allows 16 byte alignment of the vector data types?
Oh yes, I believe that is very possibly the problem.
On my system, it appears that the memory allocation is fixed as if __attribute__((aligned(8))) is imposed on the allocation. (This has no bearing on padding.)
For example: struct three { char m[3]; }; three* p = new three[4];
The addresses could be... &p[0] == 0x10008; &p[1] == 0x1000B; &p[2] == 0x1000E; &p[3] == 0x10011;
Notice that the first one is aligned on an 8th byte boundary.
The alignment "promise" of the heap management subsystem is platform dependent. As far as I am aware, there is no standard means to communicate alignment requirements to the heap manager. :-(
Some heap managers, such as the one with SAS/C++, have lots of knobs to programmatically tweak heap manager behavior. But that kind of API is not standard C or C++, and I'd be hesitant to rely upon it if portability is a concern (and for me, it is always a concern).
>This occurs to me now because of what you said earlier about allocating by malloc, and also because my test program ONLY included object on the stack.
Serendipitous comment! :-)
>If this is the case, do I need to use a special memory allocator that does aligned heap allocations?
Yes. In C++, you can override the new, new[], delete and delete[] operators of your class and instrument in the desired alignment behavior.
Alternatively, you can create your own custom allocator object -- but I'm not familiar with the caveats / pitfalls / worries of that technique.
Alternatively alternatively, you could perform the alignment yourself by kluge-magic, such as:
struct my_m128 { char m[32]; // auto-align. operator __m128& () { return *(int*)(&m[(int)(&m[0]) & 0x0F]); } };
The gotchya is the wasted space, which is only worrisome for arrays.
I think your best bet is to manage your own __m128 only mini-heap manager.
>Are any simple libraries available?
Not to my knowledge. I do know that there are several high performance heap replacement libraries (each one is tuned for different performance characteristics) -- but I do not know the details about any of them. I wouldn't be surprised if one-or-more of them are tunable to allocating only on 16th byte addresses.
Side note: some heap management libraries are useful for debugging -- double deletes / double free, overruns, underruns, scrubbing deallocated memory with a known garbage value (e.g., 0xDEADBEEF), unreleased memory at program termination (leaks), et cetera. These can be a very useful tools for the developer's arsenal.
--Eljay