Re: cache optimization

Harald Servat <harald.servat@xxxxxx> · Thu, 26 Nov 2009 16:38:19 +0100

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Although there are other factors that may apply, you must consider that
prefeteching the next element may be not enough for current
microprocessors. First, I'm pretty sure that the processor would have
the entire cache line once you bring some data. And second, nowadays,
micoprocessors have hardware prefetching that do the same work for you
without the need of adding extra instructions (remember, adding
instructions make your code slower). Also consider that the
microprocessor has other out-of-order special circuitry that may allow
it to be working on "future" iterations while waiting on a cache miss.

So I think that you may prefetch data which is farther. With current gap
between micro and memory speeds, I wouldn't be too impressed if you had
to go at 7 or 8 cache lines away.

Just my .02 euro cents :)

En/na £ukasz ha escrit:
> Hi I want to learn how to optimaze cache usage in gcc. I find builtin function __builtin_prefetch which should prefetch datas to cache .. so i use cannonical :) example of vector addition.
> 
> for (i = 0; i < n; i++)
>   {
>     a[i] = a[i] + b[i];
>     __builtin_prefetch (&a[i+1], 1, 1);
>     __builtin_prefetch (&b[i+1], 0, 1);
>     /* ... */
>   }
> 
> and compile it with gcc without special options .... but its slower than
> 
> for (i = 0; i < n; i++)
>   {
>     a[i] = a[i] + b[i];
>     /* ... */
>   }
> 
> so maybe I should compile it with soem extra options to have advantage of cache prefatching ?(-fprefetch-loop-array doenst works )
> 
> 
> 
> 

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.9 (GNU/Linux)

iEYEARECAAYFAksOoOsACgkQwMPeuqUCg9yj9wCbBd7DxNBKk9uNzV5xz4r66He4
r9gAnRncLhV0SYr6MgoUz7qG+hSL8S9b
=t8mz
-----END PGP SIGNATURE-----