Re: alignment issues for sse

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Eljay and everybody -

The memory allocation seems to have solved part of my problem, but
alas, I still get segfaults.

Here is the current state of the currently problematic function:
---------------------------------------------------------------------------------------
inline Vector3<float> operator * (float r,const Vector3<float> &v){
    Vector3<float> rval;
    
    __m128 mv __attribute__ ((aligned(16))) = _mm_set1_ps(r);
    
    rval.m.v = _mm_mul_ps(mv,v.m.v);
    return rval;
}
---------------------------------------------------------------------------------------

from the gdb snippets below, you will be able to see that rval and mv
are both being allocated on an 8 byte boundary.  Grrrr.  I also tried
declaring mv as
__m128 __attribute__((aligned(16))) mv.  Same problem.  

Perhaps it is because my function is inline that gcc is unable to
align these variables properly?  In any case, this is disturbing
behavior.  Is there some rule I'm not aware of?  Perhaps this is a bug
that has been fixed since 3.3.5?

A fellow lab mate of mine has said that he just uses global variables
because at some point he was having weird alignment issues, but my
code is (currently) multithreaded, so that would be an extremely bad
idea for me.

gdb spew:
-------------------------------------------------------------------------------------------
mv config 1:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16386 (LWP 13626)]
0x0804d931 in _mm_set1_ps (__F=0) at xmmintrin.h:881
881       __v4sf __tmp = __builtin_ia32_loadss (&__F);
(gdb) up
#1  0x0804e123 in operator* (r=0, v=@0x8153430) at vector3.hxx:431
431         __m128 __attribute__ ((aligned(16))) mv = _mm_set1_ps(r);
(gdb) p &mv
$1 = (__m128 *) 0xbf7ff7d4

mv config 2:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 16386 (LWP 13895)]
0x0804d931 in _mm_set1_ps (__F=0) at xmmintrin.h:881
881       __v4sf __tmp = __builtin_ia32_loadss (&__F);
(gdb) up
#1  0x0804e123 in operator* (r=0, v=@0x8153430) at vector3.hxx:431
431         __m128 mv __attribute__ ((aligned(16))) = _mm_set1_ps(r);
(gdb) p &mv
$1 = (__m128 *) 0xbf7ff7d4
(gdb) p &rval
$2 = (Vector3<float> *) 0xbf7ff854

Thanks,
  Brian


On Wed, 16 Feb 2005 07:01:17 -0600, Eljay Love-Jensen <eljay@xxxxxxxxx> wrote:
> Hi Brian,
> 
>  >Could the problem be that a Camera class cannot be allocated on the heap
> in such a way that allows 16 byte alignment of the vector data types?
> 
> Oh yes, I believe that is very possibly the problem.
> 
> On my system, it appears that the memory allocation is fixed as if
> __attribute__((aligned(8))) is imposed on the allocation.  (This has no
> bearing on padding.)
> 
> For example:
> struct three { char m[3]; };
> three* p = new three[4];
> 
> The addresses could be...
> &p[0] == 0x10008;
> &p[1] == 0x1000B;
> &p[2] == 0x1000E;
> &p[3] == 0x10011;
> 
> Notice that the first one is aligned on an 8th byte boundary.
> 
> The alignment "promise" of the heap management subsystem is platform
> dependent.  As far as I am aware, there is no standard means to communicate
> alignment requirements to the heap manager.  :-(
> 
> Some heap managers, such as the one with SAS/C++, have lots of knobs to
> programmatically tweak heap manager behavior.  But that kind of API is not
> standard C or C++, and I'd be hesitant to rely upon it if portability is a
> concern (and for me, it is always a concern).
> 
>  >This occurs to me now because of what you said earlier about allocating
> by malloc, and also because my test program ONLY included object on the stack.
> 
> Serendipitous comment!  :-)
> 
>  >If this is the case, do I need to use a special memory allocator that
> does aligned heap allocations?
> 
> Yes.  In C++, you can override the new, new[], delete and delete[]
> operators of your class and instrument in the desired alignment behavior.
> 
> Alternatively, you can create your own custom allocator object -- but I'm
> not familiar with the caveats / pitfalls / worries of that technique.
> 
> Alternatively alternatively, you could perform the alignment yourself by
> kluge-magic, such as:
> 
> struct my_m128
> {
>    char m[32]; // auto-align.
>    operator __m128& () { return *(int*)(&m[(int)(&m[0]) & 0x0F]); }
> };
> 
> The gotchya is the wasted space, which is only worrisome for arrays.
> 
> I think your best bet is to manage your own __m128 only mini-heap manager.
> 
>  >Are any simple libraries available?
> 
> Not to my knowledge.  I do know that there are several high performance
> heap replacement libraries (each one is tuned for different performance
> characteristics) -- but I do not know the details about any of them.  I
> wouldn't be surprised if one-or-more of them are tunable to allocating only
> on 16th byte addresses.
> 
> Side note:  some heap management libraries are useful for debugging --
> double deletes / double free, overruns, underruns, scrubbing deallocated
> memory with a known garbage value (e.g., 0xDEADBEEF), unreleased memory at
> program termination (leaks), et cetera.  These can be a very useful tools
> for the developer's arsenal.
> 
> --Eljay
> 
>

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux