optimising for CPU and compiler [was: Data and common sense aboutPIV optimizations, Gcc and the Intel compiler;]

Florin Andrei <florin@xxxxxxx> · 13 Dec 2002 16:12:02 -0800

Excellent message!

I'll insert my comments below. When reading them, please keep in mind
that my main focus is multimedia (apps, codecs, players,
transcoding...).

On Fri, 2002-12-13 at 13:19, Jean Francois Martinez wrote:
> 
> 3) The above discussion does not refer to use of MMX/SSE instructions.
> I benchmarked them and seemed to produce zero difference.  However

MMX/SSE really make a difference sometimes. They are widely used by
almost every sane multimedia application/codec/library.

I recently saw a message saying that MMX seems to be quite a bit more
efficient than SSE, at least for some particular applications/codecs.
But that could be "the benchmark effect".

> 1) Benchmarks compiled with ICC seem to be 30 to 40% faster than with
> Gcc.  However a) they are much bigger (double size or more), ICC seems

On a recent latest-gcc-versus-latest-icc test, i noticed that, while icc
still produces binaries that are overall faster, the difference is
smaller than it used to be, and for quite a few tasks gcc-3.2 produces
binaries that are actually faster.
The benchmark was focused on scientific calculations.

> to use -O3. Thus the only valid comparison is Icc -O1  versus gcc -O3. 
> At those settings Gcc code ran nearly as fast as Icc's.  In some tests
> it was even faster.  Code was larger than with gcc -O2 but still much
> smaller than Icc's.

Yup, that's the thing i mentioned.

> 3) With Icc you can turn the flags for interprocedural  optimizations.
> These made my benchmarks run still 20 or 30% faster abiove the base
> result.   There is no
> combination of flags in gcc allowing to even touch the level of
> performance you get with Icc when interprocedural optimizations are
> turned on an still less when you allow optimizations across files.

Same with MIPSPro - the SGI commercial compiler. It has the so-called
IPA option - Inter-Procedural Analysis. When you turn it on, it performs
optimisations over the entire program. It blows the guts off of gcc.

> However both of these make Icc code significantly larger (remember it
> was already very large).  That is why while interprocedural
> optimizations are great for benchmarks I am not so

Also great for applications that run as daemons, or for a long time (you
don't need to load them all the time). Also, on architectures that have
significantly higher internal bandwidth than Intel PC and CPUs with very
large caches (like SGI Origin 3k), the executable size matters less.

> Frankly I am a bit annoyed when I read the hype about Gentoo or LFS and
> how you will get precisely tuned binaries who will cure cancer and bring
> peace on earth. IMHO this is drivel for mathematically impaired  people.

:-)

> Anyone willing to run a few benchmarks on an Athlon or a PIV?

Well, i do a lot of multimedia stuff on an AthlonXP, and i always wanted
to run a gcc benchmark. The problem is, i can optimize, say, transcode
(the main multimedia converter) but the codecs that's using already have
the critical parts in assembler.
Still, maybe it's worth it. If i'll do it, i'll post the results.

-- 
Florin Andrei

When it comes to discussing Linux, some people become temporarily
insane.

_______________________________________________
Redhat-devel-list mailing list
Redhat-devel-list@redhat.com
https://listman.redhat.com/mailman/listinfo/redhat-devel-list