Hi, On Monday 21 March 2011 18:00:38 Brian Budge wrote: > On Mon, Mar 21, 2011 at 7:49 AM, Matthias Kretz wrote: > > On Monday 21 March 2011 15:23:02 Matthias Kretz wrote: > >> I tested the GCC 4.6.0 RC on Intel systems with good success so far. Now > >> I tested on an AMD Magny-Cours using the -march=barcelona flag and gcc > >> translated _mm_store_pd/s calls in the code to streaming stores in the > >> resulting binary. > >> > >> Where does this "optimization" come from and how can I disable it? This > >> doesn't make much sense on a working set that fits into the cache... > >> > >> Is this intended behavior or a bug? > > > > Additional info: If I add -fno-prefetch-loop-arrays I get normal stores > > as expected. I don't consider this a solution, though. > > Do you mean _mm_stream_pd/s? I think store will still take your > values to cache... I mean that I wrote _mm_store_pd/s in my code but I got _mm_stream_pd/s instead. Only if I compile with -fno-prefetch-loop-arrays do I actually get non-streaming stores. Regards, Matthias -- Dipl.-Phys. Matthias Kretz http://compeng.uni-frankfurt.de/?mkretz