Kevin Kofler wrote:
dragoran <dragoran <at> feuerpokemon.de> writes:
the last one (-m64) is weird (much slower!!) no -OX was used.
Then your benchmarks are essentially worthless. Sorry, but since GCC defaults
to -O0, which means no optimization whatsoever, i.e. very bad code, you should
NEVER compile production code without an -O flag (generally -O2 or -Os), and
especially not benchmarks!
So please rerun your benchmarks with -O2 to get more useful results.
ok done results see attachment
native 64bit one is much faster now (than without -O) and its now ~
equal to 32bit perfomance
as for the other result there are better then the old one but the diff
between them is not the same as before all of them have almost equal
perfomance.
so far the old benchmarks with -O2 added...
I have also tryed -mfpmath=sse and -ftree-vectorice and they seem to
have a positive effect on module 1 which is (looking at the sourcecode):
/*******************************************************/
/* Module 1. Calculate integral of df(x)/f(x) defined */
/* below. Result is ln(f(1)). There are 14 */
/* double precision operations per loop */
/* ( 7 +, 0 -, 6 *, 1 / ) that are included */
/* in the timing. */
/* 50.0% +, 00.0% -, 42.9% *, and 07.1% / */
/*******************************************************/
For x86_64 (the -m64 benchmarks) it seems that -Os is better than -O2
(maybe because on x86_64 the binarys are generally bigger and this
reduces this effect == less cache misses?)
Kevin Kofler
gcc -O2 -DUNIX flops.c -m32 -march=i386 -mtune=generic -o flops
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -8.1208e-11 0.0100 1403.2013
2 1.4704e-15 0.0086 815.2356
3 -3.8213e-15 0.0076 2238.5432
4 6.1151e-14 0.0079 1902.7553
5 -4.4419e-14 0.0159 1819.4941
6 7.7002e-15 0.0141 2059.8045
7 -6.6161e-13 0.0236 508.9145
8 2.2789e-14 0.0141 2127.2906
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 1029.1609
MFLOPS(2) = 1014.7499
MFLOPS(3) = 1567.2928
MFLOPS(4) = 2084.3364
-------------------------------------------------------------------
gcc -O2 -DUNIX flops.c -m32 -march=i686 -mtune=generic -o flops
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -8.1208e-11 0.0100 1403.2011
2 1.4704e-15 0.0091 773.0317
3 -3.8213e-15 0.0075 2252.4475
4 6.1151e-14 0.0079 1902.7549
5 -4.4419e-14 0.0159 1822.1738
6 7.7002e-15 0.0141 2057.5211
7 -6.6161e-13 0.0236 509.0832
8 2.2789e-14 0.0141 2128.4701
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 984.4080
MFLOPS(2) = 1015.3324
MFLOPS(3) = 1568.4767
MFLOPS(4) = 2086.2031
----------------------------------------------------------------------
gcc -O2 -msse2 -DUNIX flops.c -m32 -march=i686 -mtune=generic -o flops
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -8.1208e-11 0.0099 1409.8255
2 1.4704e-15 0.0094 747.2427
3 -3.8213e-15 0.0076 2233.9463
4 6.1151e-14 0.0078 1914.1380
5 -4.4419e-14 0.0160 1807.9726
6 7.7002e-15 0.0141 2051.8342
7 -6.6161e-13 0.0236 508.7459
8 2.2789e-14 0.0141 2123.7611
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 955.0271
MFLOPS(2) = 1014.0094
MFLOPS(3) = 1565.4547
MFLOPS(4) = 2082.1009
---------------------------------------------------------------------
gcc -O2 -DUNIX flops.c -m32 -march=k8 -mtune=k8 -o flops
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 -8.1208e-11 0.0102 1372.0430
2 1.4704e-15 0.0085 821.2137
3 -3.8213e-15 0.0076 2250.1178
4 6.1151e-14 0.0079 1908.4292
5 -4.4419e-14 0.0160 1816.8225
6 7.7002e-15 0.0143 2026.0742
7 -6.6161e-13 0.0237 506.5646
8 2.2789e-14 0.0141 2127.2906
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 1036.3726
MFLOPS(2) = 1008.9609
MFLOPS(3) = 1558.4048
MFLOPS(4) = 2076.1624
-------------------------------------------------------------------------
gcc -O2 -DUNIX flops.c -m64 -march=k8 -mtune=k8 -o flops
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0098 1435.8077
2 -1.4166e-13 0.0085 819.7111
3 4.7184e-14 0.0080 2122.7938
4 -1.2557e-13 0.0075 2008.2427
5 -1.3800e-13 0.0156 1854.9566
6 3.2380e-13 0.0145 2004.1944
7 -8.4583e-11 0.0204 588.2436
8 3.4867e-13 0.0148 2023.0560
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 1025.5139
MFLOPS(2) = 1110.0527
MFLOPS(3) = 1612.1846
MFLOPS(4) = 2032.3280
------------------------------------------------------------------------
gcc -O2 -msse2 -DUNIX flops.c -m32 -march=i686 -mtune=generic -mfpmath=sse -o flops
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0086 1631.9561
2 -1.4166e-13 0.0078 901.3518
3 4.7184e-14 0.0078 2186.7984
4 -1.2557e-13 0.0073 2066.6088
5 -1.3800e-13 0.0156 1856.8126
6 3.2380e-13 0.0142 2038.3128
7 -8.4583e-11 0.0203 591.6425
8 3.4867e-13 0.0163 1840.7288
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 1115.7724
MFLOPS(2) = 1129.3849
MFLOPS(3) = 1621.5578
MFLOPS(4) = 1997.4742
--------------------------------------------------------------------
gcc -O2 -msse2 -DUNIX flops.c -m32 -march=i686 -mtune=generic -mfpmath=sse -ftree-vectorize -o flops
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0089 1573.2115
2 -1.4166e-13 0.0078 897.7396
3 4.7184e-14 0.0078 2173.6899
4 -1.2557e-13 0.0073 2057.7493
5 -1.3800e-13 0.0157 1852.1796
6 3.2380e-13 0.0141 2060.9484
7 -8.4583e-11 0.0201 595.5424
8 3.4867e-13 0.0162 1847.8153
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 1110.9305
MFLOPS(2) = 1131.4868
MFLOPS(3) = 1620.0114
MFLOPS(4) = 2003.6594
---------------------------------------------------------------------
gcc -Os -DUNIX flops.c -m64 -march=k8 -mtune=generic -o flops
FLOPS C Program (Double Precision), V2.0 18 Dec 1992
Module Error RunTime MFLOPS
(usec)
1 4.0146e-13 0.0096 1450.9217
2 -1.4166e-13 0.0090 776.3812
3 4.7184e-14 0.0078 2180.2247
4 -1.2557e-13 0.0077 1953.0823
5 -1.3800e-13 0.0151 1925.1909
6 3.2380e-13 0.0140 2078.2574
7 -8.4583e-11 0.0212 566.5452
8 3.4867e-13 0.0145 2064.3871
Iterations = 512000000
NullTime (usec) = 0.0000
MFLOPS(1) = 983.3900
MFLOPS(2) = 1094.5643
MFLOPS(3) = 1624.8007
MFLOPS(4) = 2069.8902
--
fedora-devel-list mailing list
fedora-devel-list@xxxxxxxxxx
https://www.redhat.com/mailman/listinfo/fedora-devel-list