-------- Original-Nachricht -------- Datum: Fri, 10 Apr 2009 13:19:19 -0700 Von: Brian Budge <brian.budge@xxxxxxxxx> An: Martin Ettl <ettl.martin@xxxxxx> Betreff: Re: fma operation::Result Ah, back to the gcc mailing list... I can't answer that one. Brian On Fri, Apr 10, 2009 at 1:16 PM, Martin Ettl <ettl.martin@xxxxxx> wrote: > yeah, got it! > > compiled as you suggested: > g++ -march=native -O3 TestFma.cpp -lrt > > ops: acc is 1.34155e+11 in 0.423242 seconds > fma: acc is 1.34155e+11 in 0.373923 seconds > > It works fine !!! Great > > But what was the reson? It was the -ansi flag, i used in my Makefile. Here > the output with the -ansi flag: > > g++ -march=native -O3 -ansi TestFma.cpp -lrt > > ops: acc is 1.34155e+11 in 0.406768 seconds > fma: acc is 1.34155e+11 in 2.18255 seconds > > Is this a bug or a feature of gcc? > > > -------- Original-Nachricht -------- > > Datum: Fri, 10 Apr 2009 13:10:42 -0700 > > Von: Brian Budge <brian.budge@xxxxxxxxx> > > An: Martin Ettl <ettl.martin@xxxxxx> > > Betreff: Re: fma operation::Result > > > Very strange... try just this: > > > > g++ -march=native -O3 testFma.cpp -lrt > > > > This is the exact command line I used. I also use gcc4.3, and also run > on > > a > > core2 architecture. > > > > On Fri, Apr 10, 2009 at 1:07 PM, Martin Ettl <ettl.martin@xxxxxx> wrote: > > > > > Indeed, > > > > > > the results: > > > g++-4.3 -c -O3 -W -Wall -ansi -Wno-write-strings -fno-strict-aliasing : > > > ops: acc is 1.34155e+11 in 0.438047 seconds > > > fma: acc is 1.34155e+11 in 2.87379 seconds > > > > > > and > > > g++-4.3 -c -O3 -march=native -W -Wall -ansi : > > > > > > ops: acc is 1.34155e+11 in 0.416 seconds > > > fma: acc is 1.34155e+11 in 2.62504 seconds > > > > > > > > > Thats really crazy. I tried also older versions of g++ (g++4.1,g++4.2). > > Its > > > always the same. > > > > > > > > > -------- Original-Nachricht -------- > > > > Datum: Fri, 10 Apr 2009 12:58:38 -0700 > > > > Von: Brian Budge <brian.budge@xxxxxxxxx> > > > > An: Martin Ettl <ettl.martin@xxxxxx> > > > > Betreff: Re: fma operation::Result > > > > > > > Wow, that's really bad. Try setting -O3 -march=native. And try with > > and > > > > without fused-madd. > > > > > > > > On Fri, Apr 10, 2009 at 12:52 PM, Martin Ettl <ettl.martin@xxxxxx> > > > wrote: > > > > > > > > > My result is as followd: > > > > > ops: acc is 1.34155e+11 in 0.422607 seconds > > > > > fma: acc is 1.34155e+11 in 2.88275 seconds > > > > > > > > > > i used the flags: -O2 -W -Wall -ansi -Wno-write-strings > > > > > -fno-strict-aliasing -mfused-madd > > > > > > > > > > What do i wrong? > > > > > > > > > > > > > > > -------- Original-Nachricht -------- > > > > > > Datum: Fri, 10 Apr 2009 12:23:22 -0700 > > > > > > Von: Brian Budge <brian.budge@xxxxxxxxx> > > > > > > An: Martin Ettl <ettl.martin@xxxxxx> > > > > > > Betreff: Re: fma operation::Result > > > > > > > > > > > Your example was a bit complex. I've attached a simpler one > > (you'll > > > > need > > > > > > to > > > > > > link with -lrt) > > > > > > > > > > > > My results: > > > > > > > > > > > > ops: acc is 1.34155e+11 in 0.325366 seconds > > > > > > fma: acc is 1.34155e+11 in 0.302934 seconds > > > > > > > > > > > > > > > > > > On Fri, Apr 10, 2009 at 12:03 PM, Brian Budge > > <brian.budge@xxxxxxxxx > > > > > > > > > > wrote: > > > > > > > > > > > > > What are your compile options? > > > > > > > > > > > > > > My hunch is that your fma is not being inlined and so you are > > > > incurring > > > > > > > extra function call overhead > > > > > > > > > > > > > > > > > > > > > On Fri, Apr 10, 2009 at 11:53 AM, Martin Ettl > > <ettl.martin@xxxxxx> > > > > > > wrote: > > > > > > > > > > > > > >> Hello, > > > > > > >> > > > > > > >> i have done as you suggested. But i wondering about the result > > my > > > > > > testcase > > > > > > >> produced on my machine (Intel Core 2 Duo; gcc-4.3.3; Ubuntu > > Linux > > > > > > 8.10). The > > > > > > >> fma() function call is ~50% slower than the expression > (a*b)+c. > > > > Could > > > > > > that > > > > > > >> be? > > > > > > >> I have attached the testcase, to this mail. It is a small > > programm > > > > > > >> counting the processor cycles, the operation needs to execute. > > My > > > > > > output is > > > > > > >> as followed: > > > > > > >> #Iteration Cycles using fma() > > > > > > >> 0 3566020 > > > > > > >> 1 3501900 > > > > > > >> 2 3442240 > > > > > > >> 3 3481820 > > > > > > >> 4 3449920 > > > > > > >> #Iteration Cycles NOT using fma() > > > > > > >> 0 2160020 > > > > > > >> 1 2008120 > > > > > > >> 2 2002040 > > > > > > >> 3 2121120 > > > > > > >> 4 2028140 > > > > > > >> > > > > > > >> Best regards > > > > > > >> > > > > > > >> Martin > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> > > > > > > >> -------- Original-Nachricht -------- > > > > > > >> > Datum: Fri, 10 Apr 2009 10:18:48 -0700 > > > > > > >> > Von: Brian Budge <brian.budge@xxxxxxxxx> > > > > > > >> > An: Martin Ettl <ettl.martin@xxxxxx> > > > > > > >> > CC: gcc-help@xxxxxxxxxxx > > > > > > >> > Betreff: Re: fma operation > > > > > > >> > > > > > > >> > This will depend on your machine. The way to know is to > test > > it > > > > by > > > > > > >> > calling > > > > > > >> > these things in a giant loop (at least millions of times) > and > > > > using > > > > > > >> > clock_gettime to time each loop. So one giant loop with * > > and > > > +, > > > > > and > > > > > > >> > another with fma. Unless your hardware has a special madd > > type > > > > > > >> > instruction, > > > > > > >> > this will likely produce the exact same code. > > > > > > >> > > > > > > > >> > On Fri, Apr 10, 2009 at 9:37 AM, Martin Ettl < > > > ettl.martin@xxxxxx> > > > > > > >> wrote: > > > > > > >> > > > > > > > >> > > Hi, > > > > > > >> > > > > > > > > >> > > i am have made tests with the library version of > > fma-function > > > ( > > > > > > >> > > http://en.wikipedia.org/wiki/Multiply-accumulate). > > > > > > >> > > I tested this code below on my linux machine. Now, my > > question > > > > is, > > > > > > >> what > > > > > > >> > > version is faster? Version (1) or (2)? How to determine, > > that > > > > this > > > > > > >> > operation > > > > > > >> > > is executed faster on my machine? > > > > > > >> > > int main() > > > > > > >> > > { > > > > > > >> > > double a=10.2; > > > > > > >> > > double b=12.; > > > > > > >> > > double c=9.; > > > > > > >> > > double e =a*b+c; // (1) > > > > > > >> > > //double e =fma(a,b,c); // (2) > > > > > > >> > > } > > > > > > >> > > > > > > > > >> > > Thanks in advance! > > > > > > >> > > > > > > > > >> > > Best regards > > > > > > >> > > > > > > > > >> > > Ettl Martin > > > > > > >> > > > > > > > > >> > > -- > > > > > > >> > > Psssst! Schon vom neuen GMX MultiMessenger gehört? Der > > kann`s > > > > mit > > > > > > >> > allen: > > > > > > >> > > http://www.gmx.net/de/go/multimessenger01 > > > > > > >> > > > > > > > > >> > > > > > > >> -- > > > > > > >> Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + > > > > > > >> Telefonanschluss für nur 17,95 Euro/mtl.!* > > > > > > >> > > > > > http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit > > > > allen: > > > > > http://www.gmx.net/de/go/multimessenger01 > > > > > > > > > > > -- > > > Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + > > > Telefonanschluss für nur 17,95 Euro/mtl.!* > > > http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a > > > > > -- > Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + > Telefonanschluss für nur 17,95 Euro/mtl.!* > http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a > -- Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!* http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a