Michael Meissner wrote: > On Wed, Jan 07, 2009 at 10:21:28AM -0500, Wirawan Purwanto wrote: >> Hi Michael, >> >> Thanks for the answer. I would like to know if someone has investigated >> this issue for some benchmark or real-world cases. Is there any >> write-up/report/paper on this thing? > > I suspect many people have done tests, but often times not published the > results. For example, when I worked for AMD, I sometimes did SPEC runs with > -mtune=generic, -mtune=athlon, -mtune=barcelona, or -mtune=core2 to see how the > tunings affected the real hardware. I recall that there were a few benchmarks > which saw noticible differences (how integer to fp conversions was one that I > looked at for a bit). > -mtune=barcelona frequently speeds up vectorized loops on Core i7 by more than a factor of 2, compared with generic. On Core 2, of course, it's not clear cut, it speeds up more of my gfortran cases than it slows down, with the reverse being true of g++. There's not much mystery in this, as the major differences have to do with the alignment requirements of various CPU models. I thought integer to fp conversion would be more affected by -msse/sse2 than by mtune. I haven't detected any interest in having published results, when the benchmarks at http://sites.google.com/site/tprincesite/levine-callahan-dongarra-vectors or even the originals at netlib are easy enough to run yourself.