Support of non-temporal stores on ia32/Core2?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I just learned that for memory-bound codes one can improve the
memory bandwidth significantly using non-temporal stores that are
part of SSE2.  However, I could not find out how to get gcc to
generate corresponding code on a Core2.

For the stream triad kernel,

subroutine stream_kernel_triad (a, b, c, n, s)
  integer         , intent(in) :: n
  double precision             :: a(*), b(*), c(*)
  double precision, intent(in) :: s

  integer :: j
  do j = 1,n
     a(j) = b(j) + s*c(j)
  end do
end subroutine stream_kernel_triad

the Intel compiler shows a performance difference of +25%
between "-opt-streaming-stores never" and "... auto (default)"
or "... always".  On my system, gcc-4.8's performance for this
fragment matches exactly that one without NT stores from Intel's.

Is there some trick or magic flag I need to specify?

Harald




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux