Matthias Kretz <kretz@xxxxxxxxxxxxxxxxxxxxxxxx> writes: > On Monday 21 March 2011 15:23:02 Matthias Kretz wrote: >> I tested the GCC 4.6.0 RC on Intel systems with good success so far. Now I >> tested on an AMD Magny-Cours using the -march=barcelona flag and gcc >> translated _mm_store_pd/s calls in the code to streaming stores in the >> resulting binary. >> >> Where does this "optimization" come from and how can I disable it? This >> doesn't make much sense on a working set that fits into the cache... >> >> Is this intended behavior or a bug? > > Additional info: If I add -fno-prefetch-loop-arrays I get normal stores as > expected. I don't consider this a solution, though. That is precisely where this optimization is coming from. The vectorizer pretty much assumes that the working set doesn't fit in the cache. I think it would be reasonable to have an option to control this. Please consider filing a bug report as described at http://gcc.gnu.org/bugs/ , ideally with a test case. Ian