Re: cache optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



£ukasz wrote:
Hi I want to learn how to optimaze cache usage in gcc. I find builtin function __builtin_prefetch which should prefetch datas to cache .. so i use cannonical :) example of vector addition.

for (i = 0; i < n; i++)
  {
    a[i] = a[i] + b[i];
    __builtin_prefetch (&a[i+1], 1, 1);
    __builtin_prefetch (&b[i+1], 0, 1);
    /* ... */
  }

and compile it with gcc without special options .... but its slower than

for (i = 0; i < n; i++)
  {
    a[i] = a[i] + b[i];
    /* ... */
  }

so maybe I should compile it with soem extra options to have advantage of cache prefatching ?(-fprefetch-loop-array doenst works )



Under normal settings, on CPUs of the last 6 years or so, you are prefetching what has already been prefetched by hardware prefetcher. If your search engine doesn't find you many success stories about the use of this feature, that might be a clue that it involves some serious investigation. You would look for slow spots in your code which don't fall in the usual hardware supported prefetch patterns (linear access with not too large a stride, or pairs of cache lines), and experiment with fetching the data sufficiently far in advance for it to do some good, without exceeding your cache capacity. I do see a "success story" about prefetching for a reversed loop. As the author doesn't divulge the CPU in use, one suspects it might be something like the old Athlon32 which supported hardware prefetch only in the forward direction. Don't you like advice which assumes no one will ever use a CPU different (e.g. more up to date) than the author's favorite?

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux