--- Tim Prince <timothyprince@xxxxxxxxxxxxx> wrote: > ranjith kumar wrote: > > Hi, > > > > 1) Will "gcc" insert prefetch instructions > > automatically on "pentium 4" processor? > > Which flags should be enabled while compiling > sothat > > gcc automatically insert prefetch instructions? > > > > 2) Or programmer has to include some functions? > > If so, what is the syntax of that function? > > > P4 isn't suitable for automatic compiler-generated > prefetch. Default > hardware prefetch (stride-based and cache line > pairs) is quite > effective. Prefetch intrinsics are available with > #include > <xmmintrin.h>. Details on what works vary with > steppings. The earliest > P4 models could accelerate hardware prefetch by the > program issuing 3 > cache lines of prefetch prior to entering a loop. > Since Northwood, that > doesn't work. Since Prescott, prefetch hints are > ignored on P4, with > prefetch going to L2 regardless of hints. Effect of > prefetch on DTLB > misses also is model dependent. > Contrary to what certain Windows related docs say, > _mm_prefetch() works > the same on all compilers which implement it. > Hi, 1) What is the difference between "prefetchnta" and "prefetchT1" instructions in case of pentium 4 processor. In IA-32 optimization manual it was given that prefetchT1,prefetchT2 and prefetchT3 are identical in case of pentium 4 processor. Also prefetchnta fetches the data into second level cache "minimizing cache pollution". What does "minimizing cache pollution" mean? When I compared two programs, first one prefetching data using "prefetchnta" and the second one using "prefetchT0", I observed that second program was executed faster. What could be the reason? p.s: Below is the program which uses "prefetchT0". To get program which uses "prefetchnta" send 0 as second argument to fucntion in 22 line. I run then on pentium4 processor with fedora core operating system. 1 #include<stdio.h> 2 #include<xmmintrin.h> 3 int main() 4 { 5 6 int i,j,k,h; 7 struct list 8 { 9 long double w,w1,u,u1,x,x1,y,y1,z,z1; 10 long double e1,e2,e3,e4; 11 long double b1,b2,b3,b4,b5,b6; 12 long int a,b,c,d,e; 13 }l[5000]; 14 15 16 int total; 17 for(h=0;h<9;h++) 18 for(j=0;j<99999;j++) 19 for(i=0;i<1000;i++) 20 { 21 //k=rand()%500; 22 _mm_prefetch((&l[(i+2)].a),3); 23 24 total+=(l[i*5].a)*(l[i*5].b)*(l[i*5].c)*(l[i*5].d)*(l[i*5].e); 25 26 //printf("\n %d ",total); 27 } 28 29 printf("\n %d ",total); 30 } Send instant messages to your online friends http://uk.messenger.yahoo.com