Re: RFC: Optimizing for 386

Jakub Jelinek <jakub@xxxxxxxxxx> · Wed, 19 Jan 2005 12:02:36 -0500

On Wed, Jan 19, 2005 at 10:35:43AM -0600, Joseph D. Wagner wrote:
> > This is not true. The default optimization is for Pentium 4 class
> > processors.
> 
> This is not accurate.  gcc has two separate sets of optimizations.  The
> first (mtune) tunes everything except the ABI.  This is the part that is
> optimized for the Pentium 4.  The second (march) actually tunes the ABI. 
> The second is optimized for 386.  In other words, it won't take advantage
> of any instruction that didn't exist on the original 386.
> 
> I would like my ABI, especially for the graphics programs, to be optimized
> for more modern architecture, like i686.

Please read the archives, all this has been answered several times already.

Yes, we know very well about -march and -mtune difference.
The current CFLAGS (for *.i386.rpm -march=i386 -mtune=pentium4) are just fine
for the whole distro.

But please tell us what instructions could be useful and would overweight
the negatives (making FC impossible to run on certain hardware).

The ISA changes between i386 and i586 are uninteresting for the vast
majority of programs (the main differences are atomic and specialized
instructions, atomic instructions are used mostly by the kernel, C library
and thread library, all of them come in i686 variants).

The i686 ISA adds conditional moves (some argue that this is optional
i686 feature, but GCC's definition of i686 is i686 with cmov).
Now, cmov* speeds things up on certain architectures, but:
1) using it everywhere would mean that FC doesn't support various
   not so old VIA CPUs, etc.
2) on P4, cmov* instructions aren't any win over using branches

Later CPUs add MMX, 3dNOW!, SSE, SSE2, SSE3 instructions.
But
1) only GCC 4.0+ supports autovectorization, and only for SSE/SSE2
2) programs/libraries that have MMX/3dNOW!/SSE/SSE2/SSE3 assembly
   or use {,p,e,x}mmintrin.h usually choose between several
   implementations at runtime depending on what the CPU supports
   (e.g. various graphics/multimedia programs do this),
   or like e.g. gmp are packaged, so that the dynamic linker
   uses optimized libraries on SSE2+ capable CPUs
3) with SSE2+, -mfpmath=sse can speed up FPU intensive programs
   or libraries quite a bit.  But SSE2+ is very high bar for
   the lowest supported CPU by FCx/ix86 ATM, that would mean
   not supporting even 2 years old CPUs.  So, if a particular
   library is seen that has measurable improvements with
   -mfpmath=sse, it is far better to ship it as /usr/lib/sse2/
   library instead of shipping everything in .pentium4.rpm's.

Note that the vast majority of packages in the distro aren't
CPU bound, so it is just about selecting those where there
are measurable improvements that justify alternate rpms resp.
second set of libraries in .i386.rpm packages etc.

	Jakub