On 12/11/06, Steinar H. Gunderson <sgunderson@xxxxxxxxxxx> wrote:
On Mon, Dec 11, 2006 at 09:05:56AM -0200, Daniel van Ham Colchete wrote: > But, trust me on this one. It's worth it. You know what? I don't.
So test it yourself.
> Think of this: PostgreSQL and GNU LibC use a lot of complex algorithms: > btree, hashes, checksums, strings functions, etc... And you have a lot of > ways to compile it into binary code. Now you have Pentium4's vectorization > that allow you to run plenty of instructions in paralell, but AMD doesn't > have this. Intel also have SSE2 that makes double-precision floatpoint > operations a lot faster, AMD also doesn't have this (at least on 32bits). Athlon 64 has SSE2, also in 32-bit-mode.
It's true. But, I'm not saying that Postfix is faster on AMD or Intel systems. I'm saying that it's a lot faster on you compile Postfix and your glibc to your processor. AMD also has features that Intel systems doesn't: 3dNow for example. The fact is that if your distro is compatible with a plain Athlon, you can't use neighter SSE nor SSE2.
Of course, it doesn't really matter, since at the instant you hit the disk even once, it's going to take a million cycles and any advantage you got from saving single cycles is irrelevant.
Really??? We're talking about high performance systems and every case is diferent. I once saw a ddr2 ram based storage once (like 1TB). Before you say it, I don't understand how it works, but you won't lose your data on a reboot or powerfailure. It was very expensive but really solve this thing with the IO bottleneck. Even when your bottleneck is the IO, still makes no sense to waste CPU resources unnecessarily.
> Imagine that you are GCC and that you have two options in front of > you: you can use FSQRT or FDIV plus 20 ADD/SUB. Could you please describe a reasonable case where GCC would have such an option? I cannot imagine any.
As I said, it is an example. Take floatpoint divisions. You have plenty of ways of doing it: 387, MMX, SSE, 3dNow, etc... Here GCC have to make a choice. And this is only one case. Usually, compiler optimizations are really complex and the processor's timings counts a lot. At every optimization the compile needs to mesure the quickest path, so it uses information on how the processor will run the code. If you take a look the AMD's docs you will see that theirs SSE2 implementation is diferent from Intel's internally. So, sometimes the quickest path uses SSE2 and sometimes it doesn't. You also have to count the costs of converting SSE registers to commom ones. If you still can't imagine any case, you can read Intel's assembler reference. You'll see that there are a lot of ways of doing a lot of things.
> An example that I know of: it's impossible to run my software at a > high demanding customer without compiling it to the it's processor (I > make 5 compilations on every release). What's "your software"? How can you make such assertions without backing them up? How can you know that the same holds for PostgreSQL? As Mike said, point to the benchmarks showing this "essential" difference between -O2 and -O2 -mcpu=pentium4 (or whatever). The only single worthwhile difference I can think of, is that glibc can use the SYSENTER function if it knows you have a 686 or higher (which includes AMD), and with recent kernels, I'm not even sure if that is needed anymore.
Steinar, you should really test it. I won't read the PostgreSQL source to point you were it could use SSE or SSE2 or whatever. And I won't read glibc's code. You don't need to belive in what I'm saying. You can read GCC docs, Intel's assembler reference, AMD's docs about their processor and about how diferent that arch is. Best regards, Daniel Colchete