Re: _mm_store_pd/s translated to movntpd/s

Matthias Kretz <kretz@xxxxxxxxxxxxxxxxxxxxxxxx> · Mon, 21 Mar 2011 18:55:59 +0100

Hi,

On Monday 21 March 2011 18:00:38 Brian Budge wrote:
> On Mon, Mar 21, 2011 at 7:49 AM, Matthias Kretz wrote:
> > On Monday 21 March 2011 15:23:02 Matthias Kretz wrote:
> >> I tested the GCC 4.6.0 RC on Intel systems with good success so far. Now
> >> I tested on an AMD Magny-Cours using the -march=barcelona flag and gcc
> >> translated _mm_store_pd/s calls in the code to streaming stores in the
> >> resulting binary.
> >> 
> >> Where does this "optimization" come from and how can I disable it? This
> >> doesn't make much sense on a working set that fits into the cache...
> >> 
> >> Is this intended behavior or a bug?
> > 
> > Additional info: If I add -fno-prefetch-loop-arrays I get normal stores
> > as expected. I don't consider this a solution, though.
> 
> Do you mean _mm_stream_pd/s?  I think store will still take your
> values to cache...

I mean that I wrote _mm_store_pd/s in my code but I got _mm_stream_pd/s 
instead. Only if I compile with -fno-prefetch-loop-arrays do I actually get 
non-streaming stores.

Regards,
	Matthias

-- 
Dipl.-Phys. Matthias Kretz
http://compeng.uni-frankfurt.de/?mkretz