On Dec 15, 2006, at 01:16 , Ron wrote:
At 05:39 PM 12/14/2006, Alexander Staubo wrote:
On Dec 14, 2006, at 20:28 , Ron wrote:
Can you do runs with just CFLAGS="-O3" and just CFLAGS="-msse2 -
mfpmath=sse -funroll-loops -m64 - march=opteron -pipe" as well ?
All right. From my perspective, the effect of -O3 is significant,
whereas architecture-related optimizations have no statistically
significant effect.
Is this opinion? Or have you rerun the tests using the flags I
suggested? If so, can you post the results?
Sorry, I neglected to include the pertinent graph:
http://purefiction.net/paste/pgbench2.pdf
The raw data:
CFLAGS="-msse2 -mfpmath=sse -funroll-loops -m64 -march=opteron -pipe":
18480.899621 19977.162108 19640.562003 19823.585944 19500.293284
19964.383540 20228.664827
20515.766366 19956.431120 19740.795459 20184.551390 19984.907398
20457.260691 19771.395220
20159.225628 19907.248149 20197.580815 19947.498185 20209.450748
20088.501904
CFLAGS="-O3"
23814.672315 26846.761905 27137.807960 26957.898233 27109.057570
26997.227925 27291.056939
27565.553643 27422.624323 27392.397185 27757.144967 27402.365372
27563.365421 27349.544685
27544.658154 26957.200592 27523.824623 27457.380654 27052.910082
24452.819263
CFLAGS="-O0"
18440.181894 19207.882300 19894.432185 19635.625622 19876.858884
20032.597042 19683.597973
20370.166669 19989.157881 20207.343510 19993.745956 20081.353580
20356.416424 20047.810017
20319.834190 19417.807528 19906.788454 20536.039929 19491.308046
20002.144230
CFLAGS="-O3 -msse2 -mfpmath=sse -funroll-loops -m64 -march=opteron -
pipe"
23830.358351 26162.203569 25569.091264 26762.755665 26590.822550
26864.908197 26608.029665
26796.116921 26323.742015 26692.576261 26878.859132 26106.770425
26328.371664 26755.595130
25488.304946 26635.527959 26377.485023 24817.590708 26480.245737
26223.427801
If "-O3 -msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron -
pipe" results in a 30-40% speed up over "-O0", and
" -msse2 - mfpmath=sse -funroll-loops -m64 - march=opteron -pipe"
results in a 5-10% speedup, then ~ 1/8 - 1/3 of the total possible
speedup is due to arch specific optimizations.
Unfortunately, I don't see a 5-10% speedup; "-O0" and "-msse2 ..."
are statistically identical.
Alexander.