Re: i386 kernel not included?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 23 Oct 2002, Thomas Dodd wrote:

>> But if most pre i686 CPUS (pre PPro/PII/Athlon) run the i386 code
>> mix faster than the pentium mix, why not supply the i386 mix.
>> I woul thing there are more 486s, P/MMX, K5, K6, and Cyrix CPUs
>> still in use than Pentiums (pre MMX).
>
>
>A test using a simple C source file:
>
>-march=i386 -mcpu=i586  and -march=i586 -mcpu=i586
>were the same.

Yep, that's to be expected as previously discussed.  ;o)


>-march=i386 -mcpu=i586  and -march=i386 -mcpu=i686
>had a lot of differences. The instruction mix was very different.

Right, that's to be expected.  While both of the above two will 
use the i386 compatible instruction set, they will each choose 
different instructions based on which instructions perform best 
for the target CPU, and order them also in a manner that works 
best for the target CPU.  i586 and i686 class machines differ a 
fair amount in this regard, so I'd expect the generated code to 
look quite different, even though they're using the same 
instruction set.


>-march=i386 -mcpu=i586  and  -march=i386 -mcpu=athlon
>Very different to.

Same as above.


>-march=i386 -mcpu=i686  was the same as -march=i386 -mcpu=athlon
>Most interesting to me,
>The mix is different.
>
>example
>i686                      athlon
>movl -24(%edp), %edx      andl -24(%edp), %eax
>andl %edx, %eax
>
>
>movl %eax, %edx           imull $100, %eax, %edx
>movl %edx, %eax
>sall $2, %eax
>addl %edx, %eax
>leal 0(,%eax,4), %edx
>addl %edx, %eax
>leal 0(,%eax,4), %edx

Very interesting.  I didn't realize gcc 3.2 would actually be 
this different with -mcpu=athlon.


>That's a large difference to me. 1 instruction instead of 7,
>that allows better usage of the instruction decoders, and less
>pressure on the L1 cache, probably L2 as well. Also less
>register pressure, the first one leave %edx alone, free for
>other uses.

Yes, the athlon example above uses much less instructions and 
also cache footprint, but does it perform as good as the code on 
the left for i686?  I'm not saying it does or doesn't, but rather 
that it would be nice to see actual timings of the code.  The 
idea here being that smaller code doesn't necessarily mean faster 
code.  I don't have manuals handy to look up IMUL et al. for 
timings.


>This one file doesn't save much, but by the time you do 
>a full app, it could be a lot.

It could.  It's definitely important to do profiling though.


>I need a good example app to test with, to see what
>effect this has in a larger app.

Good idea.  If you gprof/oprofile it, post your results too.

Take care,
TTYL

-- 
Mike A. Harris		ftp://people.redhat.com/mharris
OS Systems Engineer
XFree86 maintainer
Red Hat Inc.



-- 
Psyche-list mailing list
Psyche-list@redhat.com
https://listman.redhat.com/mailman/listinfo/psyche-list

[Index of Archives]     [Fedora General Discussion]     [Red Hat General Discussion]     [Centos]     [Kernel]     [Red Hat Install]     [Red Hat Watch]     [Red Hat Development]     [Red Hat 9]     [Gimp]     [Yosemite News]

  Powered by Linux