Hi guys - I use posix_memalign to allocate memory on the heap... this works great. My problem now is that I'm declaring variables on the stack and they're not being aligned. So I wrote a little class which will soon (hopefully) be thread global (using __thread), that uses posix_memalign to basically allocate a stack of aligned addresses that I can use. I am doing this because I'm under the impression that doing a malloc and free in each call would be rather costly. If (when!) I get it to work, I'll post it. Corey, you talk about advancing the pointer until it is aligned... is there a trick like that for the stack? Brian On Thu, 17 Feb 2005 12:10:47 +0200 (EET), Kimmo Fredriksson <kfredrik@xxxxxxxxxxxxx> wrote: > On Wed, 16 Feb 2005, corey taylor wrote: > > > I see the definition, but I've never seen any documentation on them. > > Can you point to some useful documentation? > > See Intel C++ Compiler User's Guide (copy-pasted): > > Use the _mm_malloc and _mm_free intrinsics to allocate and free aligned > blocks of memory. These intrinsics are based on malloc and free, which > are in the libirc.a library. You need to include malloc.h. The syntax for > these intrinsics is as follows: > > void* _mm_malloc (int size, int align) > > void _mm_free (void *p) > > The _mm_malloc routine takes an extra parameter, which is the alignment > constraint. This constraint must be a power of two. The pointer that is > returned from _mm_malloc is guaranteed to be aligned on the specified > boundary. > > Note > > Memory that is allocated using _mm_malloc must be freed using _mm_free . > Calling free on memory allocated with _mm_malloc or calling _mm_free on > memory allocated with malloc will cause unpredictable behavior. > > From gcc's version of xmmintrin.h: > > /* Implemented from the specification included in the Intel C++ Compiler > User Guide and Reference, version 8.0. */ > > So gcc implements these for icc compatibility. > > K > > > > > Currently, we develop for many platforms, so portability is better in > > most instances although we do use MMX and some SSE for speed where > > available. > > > > corey > > > > > > On Thu, 17 Feb 2005 01:51:44 +0200 (EET), Kimmo Fredriksson > > <kfredrik@xxxxxxxxxxxxx> wrote: > >> Hi, > >> > >> [Disclaimer: I haven't really been following this discussion...] > >> > >> On Wed, 16 Feb 2005, corey taylor wrote: > >> > >>> However, after looking into the current public project I'm on, I > >>> realize that it doesn't use SSE for the allocation. It simply > >>> advances to an aligned location and manually forces the alignment, > >>> hides the actual allocation pointer, and returns the aligned pointer. > >> > >> Why not use: > >> > >> void * _mm_malloc (size_t size, size_t alignment) > >> void _mm_free (void * ptr) > >> > >> ? > >> > >> Defined in xmmintrin.h (I think). > >> > >>> On Wed, 16 Feb 2005 17:58:15 +0100, Brian Budge <brian.budge@xxxxxxxxx> wrote: > >> > >>>> On Wed, 16 Feb 2005 10:46:54 -0600, corey taylor <corey.taylor@xxxxxxxxx> wrote: > >>>>> Implementation's I've used and worked on always do aligned allocations > >>>>> manually. Typically the hidden and real sizes of the allocation are > >>>>> put into the memory allocation itself and the returned pointer is > >>>>> incremented a few bytes. The downside to this is that you must be > >>>>> strict in using the aligned free routine also. > >> > >> See above. > >> > >>>>> On Wed, 16 Feb 2005 10:09:27 -0600, Eljay Love-Jensen <eljay@xxxxxxxxx> wrote: > >> > >>>>>>> But surely thousands of people are writing sse code... how do they make > >>>>>> it work? > >>>>>> > >>>>>> I presume by taking measures to assure the SSE structs are properly > >>>>>> aligned. > >>>>>> > >>>>>>> Do I need to switch to the intel compiler/linker? > >>>>>> > >>>>>> I do not know. > >> > >> I do not know either, but that was my solution... > >> > >> But: my sse code used to work just fine with gcc. Then something happened, > >> and I just get seg faults. Don't remember exactly anymore, but I think at > >> the time it actually worked with gcc, I was using some early gcc 3.4 > >> snapshot, since it was the only one that worked. No version before, no > >> version after (that I have tried, excluding e.g. 4.0)... And of course > >> there is also the possibility that something else changed, I do/did > >> something wrong, etc. Anyways, currently I use icc for sse code, and use > >> _mm_malloc/_mm_free for dynamic allocation, statics are automagically 16 > >> byte aligned. > >> > >> For other things, I still use mostly gcc. > >> > >> K > >> > >> > > >