On Wed, 23 Oct 2002, Thomas Dodd wrote: >> But if most pre i686 CPUS (pre PPro/PII/Athlon) run the i386 code >> mix faster than the pentium mix, why not supply the i386 mix. >> I woul thing there are more 486s, P/MMX, K5, K6, and Cyrix CPUs >> still in use than Pentiums (pre MMX). > > >A test using a simple C source file: > >-march=i386 -mcpu=i586 and -march=i586 -mcpu=i586 >were the same. Yep, that's to be expected as previously discussed. ;o) >-march=i386 -mcpu=i586 and -march=i386 -mcpu=i686 >had a lot of differences. The instruction mix was very different. Right, that's to be expected. While both of the above two will use the i386 compatible instruction set, they will each choose different instructions based on which instructions perform best for the target CPU, and order them also in a manner that works best for the target CPU. i586 and i686 class machines differ a fair amount in this regard, so I'd expect the generated code to look quite different, even though they're using the same instruction set. >-march=i386 -mcpu=i586 and -march=i386 -mcpu=athlon >Very different to. Same as above. >-march=i386 -mcpu=i686 was the same as -march=i386 -mcpu=athlon >Most interesting to me, >The mix is different. > >example >i686 athlon >movl -24(%edp), %edx andl -24(%edp), %eax >andl %edx, %eax > > >movl %eax, %edx imull $100, %eax, %edx >movl %edx, %eax >sall $2, %eax >addl %edx, %eax >leal 0(,%eax,4), %edx >addl %edx, %eax >leal 0(,%eax,4), %edx Very interesting. I didn't realize gcc 3.2 would actually be this different with -mcpu=athlon. >That's a large difference to me. 1 instruction instead of 7, >that allows better usage of the instruction decoders, and less >pressure on the L1 cache, probably L2 as well. Also less >register pressure, the first one leave %edx alone, free for >other uses. Yes, the athlon example above uses much less instructions and also cache footprint, but does it perform as good as the code on the left for i686? I'm not saying it does or doesn't, but rather that it would be nice to see actual timings of the code. The idea here being that smaller code doesn't necessarily mean faster code. I don't have manuals handy to look up IMUL et al. for timings. >This one file doesn't save much, but by the time you do >a full app, it could be a lot. It could. It's definitely important to do profiling though. >I need a good example app to test with, to see what >effect this has in a larger app. Good idea. If you gprof/oprofile it, post your results too. Take care, TTYL -- Mike A. Harris ftp://people.redhat.com/mharris OS Systems Engineer XFree86 maintainer Red Hat Inc. -- Psyche-list mailing list Psyche-list@redhat.com https://listman.redhat.com/mailman/listinfo/psyche-list