Re: prefetching on pentium 4

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



--- Tim Prince <timothyprince@xxxxxxxxxxxxx> wrote:

> ranjith kumar wrote:
> > Hi,
> > 
> >    1) Will "gcc" insert prefetch instructions
> > automatically on "pentium 4" processor?
> > Which flags should be enabled while compiling
> sothat
> > gcc  automatically insert prefetch instructions?
> > 
> > 2) Or programmer has to include some functions?
> >    If so, what is the syntax of that function?
> > 
> P4 isn't suitable for automatic compiler-generated
> prefetch.  Default 
> hardware prefetch (stride-based and cache line
> pairs) is quite 
> effective. 

 Prefetch intrinsics are available with
> #include 
> <xmmintrin.h>.  Details on what works vary with
> steppings.  The earliest 
> P4 models could accelerate hardware prefetch by the
> program issuing 3 
> cache lines of prefetch prior to entering a loop. 
> Since Northwood, that 
> doesn't work.  Since Prescott, prefetch hints are
> ignored on P4, with 
> prefetch going to L2 regardless of hints.  Effect of
> prefetch on DTLB 
> misses also is model dependent.
> Contrary to what certain Windows related docs say,
> _mm_prefetch() works 
> the same on all compilers which implement it.
> 

Hi,
   1) What is the difference between "prefetchnta" and

"prefetchT1" instructions in case of pentium 4
processor.
   In IA-32 optimization manual it was given that
prefetchT1,prefetchT2 and prefetchT3 are identical in
case of pentium 4 processor. Also prefetchnta fetches
the data into second level cache "minimizing cache
pollution". What does "minimizing cache pollution"
mean?

When I compared two programs, first one prefetching
data using "prefetchnta" and the second one using
"prefetchT0", I observed that second program was
executed faster. What could be the reason?

p.s: Below is the program which uses "prefetchT0". To
get program which uses "prefetchnta" send 0 as second
argument to fucntion in 22 line. I run then on
pentium4 processor with fedora core operating system.

    1 #include<stdio.h>
      2 #include<xmmintrin.h>
      3 int main()
      4 {
      5
      6 int i,j,k,h;
      7         struct list
      8         {
      9                 long double
w,w1,u,u1,x,x1,y,y1,z,z1;
     10                 long double e1,e2,e3,e4;
     11                 long double b1,b2,b3,b4,b5,b6;
     12                 long int a,b,c,d,e;
     13         }l[5000];
     14
     15
     16 int total;
     17 for(h=0;h<9;h++)
     18 for(j=0;j<99999;j++)
     19 for(i=0;i<1000;i++)
     20 {
     21 //k=rand()%500;
     22 _mm_prefetch((&l[(i+2)].a),3);
     23
     24
total+=(l[i*5].a)*(l[i*5].b)*(l[i*5].c)*(l[i*5].d)*(l[i*5].e);
     25
     26 //printf("\n %d ",total);
     27 }
     28
     29 printf("\n %d ",total);
     30 }






Send instant messages to your online friends http://uk.messenger.yahoo.com 

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux