>From benchmarks and with Gcc: 1) Most of the optimization (at least 80% of it) doesn't come from processor specific instructions but from selecting alternatives who are better to a specific processor. On a PIII you gain around 10% when using -mcpu=i686 instead of -mcpu=i386 (mcpu=i686 means the compiler will select the faster sequences for the PIII family but will use only 386 instructions) while using -march=i686 gains only a paltry 2% over -mcpu=i686. I don't know if this because there is little to be gained or because gcc does a bad job 2) RedHat compiles most packages with -mcpu=i686 (except for software who has parts in assembler like kernel and glibc, for these you get a processor specific package compiled with -march=). Thus if you are using a PIV and recompile with -march=pentium4 you will gain probably little from the -mcpu=pentium4 to -march=pentium4 part (ie from use specific instructions) since use of processor-specific instructiions is nearly irrelevant in the far more mature i686 optimizer. For the -mcpu=i686 to -mcpu=pentium4 it would be nice if someone ran a few benchmarks: for one part the Pentium 4 is reported to be highly sensitive to exact instruction ordering (much more than a PIII) but for another part the PIV optimizer in gcc is quite young and I would bet it doesn't do an outstanding job. 3) The above discussion does not refer to use of MMX/SSE instructions. I benchmarked them and seemed to produce zero difference. However a) my benchmark was probably not adequate for exerting them and b) the processors I have access to are slow when going from MMX to SSE mode and back so you need long sequences of MMX instructions in order to recover the "investment". In newer processors like the Athlon XP, there is nearly zero overhead for mode switching so you would probably get better results for MMX/SSE with an Athlon or a PIV. Gcc 3.2 versus Intel compiler. 1) Benchmarks compiled with ICC seem to be 30 to 40% faster than with Gcc. However a) they are much bigger (double size or more), ICC seems to do a LOT of inlining and b) when you read Icc's doc you notice that Icc does function-inlining at -O1 level of optimization while Gcc does not use optimizations who have harmful effects at -O2 or below. Since function-inlining makes code bigger gcc does not use it at -O2 you have to use -O3. Thus the only valid comparison is Icc -O1 versus gcc -O3. At those settings Gcc code ran nearly as fast as Icc's. In some tests it was even faster. Code was larger than with gcc -O2 but still much smaller than Icc's. 2) Using optimization levels above -O1 seems to have zero effect with Icc. In gcc the "stopping point" is at -O3: beyond it gains are very small. 3) With Icc you can turn the flags for interprocedural optimizations. These made my benchmarks run still 20 or 30% faster abiove the base result. There is no combination of flags in gcc allowing to even touch the level of performance you get with Icc when interprocedural optimizations are turned on an still less when you allow optimizations across files. However both of these make Icc code significantly larger (remember it was already very large). That is why while interprocedural optimizations are great for benchmarks I am not so sure it would be a good idea for say, StarOffice, since there is a good chance the much larger Icc binaries running out of cache or TLB entries and that would cause a slowdown far larger than the acceleration of Icc's better code. A bit of common sense. Frankly I am a bit annoyed when I read the hype about Gentoo or LFS and how you will get precisely tuned binaries who will cure cancer and bring peace on earth. IMHO this is drivel for mathematically impaired people. At least if you are using a PIII. Let's remember that RedHat's ordinary binaries are only at 2% of the maximum you can get (by recompiling with -march=i686) and that special binaries (eg kernel and glibc) are already compiled with full optimization. What that means? Let's say your box spends a day recompiling the distribution. You will only recover your investment after 50 days. Nearly two months. But this assumes a) it spends these two months doing pure number crunching, no disk activity, no wait for user input b) it spends zero time in glibc or kernel (except for clock ticks) since original glibc and kernel are already compiled with full optimizations. In a realistic scenario you will never recover your investment before upgrade day. I don't know for PIV. For Athlon I can only do an educated guess: AMD knows well most of the time its processors will be running code who has been optimized for Intel ones AMD cannot make processors whose performance crumbles if sequence is not exactly optimized for them. So AMD processors have either to be agnostic (ie sequnce A and sequence B are equally fast on them) or haves speed tables close to the speed table of their main Intel rival (ie if A is faster than B on Intel, AMD will ensure it is also faster on Athlon). That is why I doubt compiling specifically for Athlon makes code much faster than the optimized for PIII code shipped by RedHat. It also depends on if gcc is really good at optimizing for Athlon. And that is a big if. Anyone willing to run a few benchmarks on an Athlon or a PIV? JFM _______________________________________________ Redhat-devel-list mailing list Redhat-devel-list@redhat.com https://listman.redhat.com/mailman/listinfo/redhat-devel-list