On 12/11/06, Daniel van Ham Colchete <daniel.colchete@xxxxxxxxx> wrote:
But, trust me on this one. It's worth it. Think of this: PostgreSQL and GNU LibC use a lot of complex algorithms: btree, hashes, checksums, strings functions, etc... And you have a lot of ways to compile it into binary code. Now you have Pentium4's vectorization that allow you to run plenty of instructions in paralell, but AMD doesn't have this. Intel also have SSE2 that makes double-precision floatpoint operations a lot faster, AMD also doesn't have this (at least on 32bits). Now imagine that you're RedHat and that you have to deliver one CD to AMD and Intel servers. That means you can't use any AMD-specific or Intel-specific tecnology at the binary level.
AMD processors since the K6-2 and I think Intel ones since P-Pro are essentially RISC processors with a hardware microcode compiler that translates and reorganizes instructions on the fly. Instruction choice and ordering was extremely important in older 32 bit architectures (like the 486) but is much less important these days. I think you will find that an optimized glibc might be faster in specific contrived cases, the whole is unfortunately less than the sum of its parts. While SSE2 might be able to optimize things like video decoding and the like, for most programs it's of little benifit and IMO a waste of time. Also as others pointed out things like cache hits/misses and i/o considerations are actually much more important than instruction execution speed. We ran Gentoo here for months and did not to be faster enough to merit the bleeding edge quirks it has for production environments. If you dig assembly, there was an interesting tackle of the spinlocks code on the hackers list last year IIRC. merlin