"(remember, adding instructions make your code slower)" I would agree with you totaly two years ago, before i started writing GPU programs. There, programs which copy special part of data to SHARE memory, and has about 20% more instructions because of it, works 10 times faster than those without. Thanks for answer and if you have some nice example of speeding up CPU programs with using prefatching, or some good literature please give link :). CentCollector :) --- On Thu, 11/26/09, Harald Servat <harald.servat@xxxxxx> wrote: > From: Harald Servat <harald.servat@xxxxxx> > Subject: Re: cache optimization > To: "£ukasz" <blurrpp@xxxxxxxxx> > Cc: gcc-help@xxxxxxxxxxx > Date: Thursday, November 26, 2009, 4:38 PM > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > Although there are other factors that may apply, you must > consider that > prefeteching the next element may be not enough for > current > microprocessors. First, I'm pretty sure that the processor > would have > the entire cache line once you bring some data. And second, > nowadays, > micoprocessors have hardware prefetching that do the same > work for you > without the need of adding extra instructions (remember, > adding > instructions make your code slower). Also consider that > the > microprocessor has other out-of-order special circuitry > that may allow > it to be working on "future" iterations while waiting on a > cache miss. > > So I think that you may prefetch data which is farther. > With current gap > between micro and memory speeds, I wouldn't be too > impressed if you had > to go at 7 or 8 cache lines away. > > Just my .02 euro cents :) > > En/na £ukasz ha escrit: > > Hi I want to learn how to optimaze cache usage in gcc. > I find builtin function __builtin_prefetch which should > prefetch datas to cache .. so i use cannonical :) example of > vector addition. > > > > for (i = 0; i < n; i++) > > { > > a[i] = a[i] + b[i]; > > __builtin_prefetch > (&a[i+1], 1, 1); > > __builtin_prefetch > (&b[i+1], 0, 1); > > /* ... */ > > } > > > > and compile it with gcc without special options .... > but its slower than > > > > for (i = 0; i < n; i++) > > { > > a[i] = a[i] + b[i]; > > /* ... */ > > } > > > > so maybe I should compile it with soem extra options > to have advantage of cache prefatching > ?(-fprefetch-loop-array doenst works ) > > > > > > > > > > > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v2.0.9 (GNU/Linux) > > iEYEARECAAYFAksOoOsACgkQwMPeuqUCg9yj9wCbBd7DxNBKk9uNzV5xz4r66He4 > r9gAnRncLhV0SYr6MgoUz7qG+hSL8S9b > =t8mz > -----END PGP SIGNATURE----- >