-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Although there are other factors that may apply, you must consider that prefeteching the next element may be not enough for current microprocessors. First, I'm pretty sure that the processor would have the entire cache line once you bring some data. And second, nowadays, micoprocessors have hardware prefetching that do the same work for you without the need of adding extra instructions (remember, adding instructions make your code slower). Also consider that the microprocessor has other out-of-order special circuitry that may allow it to be working on "future" iterations while waiting on a cache miss. So I think that you may prefetch data which is farther. With current gap between micro and memory speeds, I wouldn't be too impressed if you had to go at 7 or 8 cache lines away. Just my .02 euro cents :) En/na £ukasz ha escrit: > Hi I want to learn how to optimaze cache usage in gcc. I find builtin function __builtin_prefetch which should prefetch datas to cache .. so i use cannonical :) example of vector addition. > > for (i = 0; i < n; i++) > { > a[i] = a[i] + b[i]; > __builtin_prefetch (&a[i+1], 1, 1); > __builtin_prefetch (&b[i+1], 0, 1); > /* ... */ > } > > and compile it with gcc without special options .... but its slower than > > for (i = 0; i < n; i++) > { > a[i] = a[i] + b[i]; > /* ... */ > } > > so maybe I should compile it with soem extra options to have advantage of cache prefatching ?(-fprefetch-loop-array doenst works ) > > > > -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.9 (GNU/Linux) iEYEARECAAYFAksOoOsACgkQwMPeuqUCg9yj9wCbBd7DxNBKk9uNzV5xz4r66He4 r9gAnRncLhV0SYr6MgoUz7qG+hSL8S9b =t8mz -----END PGP SIGNATURE-----