Hi, I tested the GCC 4.6.0 RC on Intel systems with good success so far. Now I tested on an AMD Magny-Cours using the -march=barcelona flag and gcc translated _mm_store_pd/s calls in the code to streaming stores in the resulting binary. Where does this "optimization" come from and how can I disable it? This doesn't make much sense on a working set that fits into the cache... Is this intended behavior or a bug? Cheers, Matthias -- Dipl.-Phys. Matthias Kretz http://compeng.uni-frankfurt.de/?mkretz