Re: [4.4] Strange performance regression?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Tim

thanks for the reply.

On Wed, Oct 14, 2009 at 12:27 AM, Tim Prince <n8tm@xxxxxxx> wrote:
> Ian Lance Taylor wrote:
>>
>> In my experience, a performance drop in a tight loop when you remove a
>> line of code means that your loop is extremely sensitive to cache line
>> boundaries.  It can be difficult to find the optimal code other than
>> by testing various command line options.  Options to particularly test
>> are -falign-loops, -falign-labels, and -falign-jumps.
>
> That seems useful advice.  The align- options could help the hot loops fit
> Loop Stream Detector criteria.  If you set -funroll-loops, you may exceed
> the loop size which fits LSD on older CPUs, but you would often make the LSD
> unnecessary.

Blast it! -funroll-loops did the trick, now the speed is again within
5% of the optimal performance. Just for the record, the flags I'm
using right now are:

-O2 -march=core2 -funroll-loops -fomit-frame-pointer

\o/

>>
>> Also, be sure that you are using a -mtune option appropriate for the
>> processor on which you are running.  E.g., you mention Core2, so you
>> should be using -mtune=core2.
>
> For the 64-bit compiler, the default may be better than core2, but for
> 32-bit you should be using at least -march=pentium-m.  If you are using
> vectorizer, -mtune=barcelona could make a difference either way.
> How are you controlling which threads run on which cache, in case there are
> cache sharing considerations?

I've played a bit with the options and the -mtune=barcelona does seem
to do a small difference. At the moment the code is single-threaded,
I've been trying various approaches to parallelize it but, the
algorithm being so constrained by memory bandwidth, I've yet to find a
solution that gives reasonable speedup while keeping the overhead low.
But, are there portable ways of controlling which threads run on which
cache?

Thanks again very much!

  Francesco.


[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux