Fwd: Re: fma operation::Result

"Martin Ettl" <ettl.martin@xxxxxx> · Fri, 10 Apr 2009 22:24:09 +0200

-------- Original-Nachricht --------
Datum: Fri, 10 Apr 2009 13:19:19 -0700
Von: Brian Budge <brian.budge@xxxxxxxxx>
An: Martin Ettl <ettl.martin@xxxxxx>
Betreff: Re: fma operation::Result

Ah, back to the gcc mailing list...  I can't answer that one.

  Brian

On Fri, Apr 10, 2009 at 1:16 PM, Martin Ettl <ettl.martin@xxxxxx> wrote:

> yeah, got it!
>
> compiled as you suggested:
>  g++ -march=native -O3 TestFma.cpp -lrt
>
> ops: acc is 1.34155e+11 in 0.423242 seconds
> fma: acc is 1.34155e+11 in 0.373923 seconds
>
> It works fine !!! Great
>
> But what was the reson? It was the -ansi flag, i used in my Makefile. Here
> the output with the -ansi flag:
>
> g++ -march=native -O3 -ansi TestFma.cpp -lrt
>
> ops: acc is 1.34155e+11 in 0.406768 seconds
> fma: acc is 1.34155e+11 in 2.18255 seconds
>
> Is this a bug or a feature of gcc?
>
>
> -------- Original-Nachricht --------
> > Datum: Fri, 10 Apr 2009 13:10:42 -0700
> > Von: Brian Budge <brian.budge@xxxxxxxxx>
> > An: Martin Ettl <ettl.martin@xxxxxx>
> > Betreff: Re: fma operation::Result
>
> > Very strange... try just this:
> >
> > g++ -march=native -O3 testFma.cpp -lrt
> >
> > This is the exact command line I used.  I also use gcc4.3, and also run
> on
> > a
> > core2 architecture.
> >
> > On Fri, Apr 10, 2009 at 1:07 PM, Martin Ettl <ettl.martin@xxxxxx> wrote:
> >
> > > Indeed,
> > >
> > > the results:
> > > g++-4.3 -c -O3 -W -Wall -ansi -Wno-write-strings -fno-strict-aliasing :
> > > ops: acc is 1.34155e+11 in 0.438047 seconds
> > > fma: acc is 1.34155e+11 in 2.87379 seconds
> > >
> > > and
> > > g++-4.3 -c -O3 -march=native -W -Wall -ansi :
> > >
> > > ops: acc is 1.34155e+11 in 0.416 seconds
> > > fma: acc is 1.34155e+11 in 2.62504 seconds
> > >
> > >
> > > Thats really crazy. I tried also older versions of g++ (g++4.1,g++4.2).
> > Its
> > > always the same.
> > >
> > >
> > > -------- Original-Nachricht --------
> > > > Datum: Fri, 10 Apr 2009 12:58:38 -0700
> > > > Von: Brian Budge <brian.budge@xxxxxxxxx>
> > > > An: Martin Ettl <ettl.martin@xxxxxx>
> > > > Betreff: Re: fma operation::Result
> > >
> > > > Wow, that's really bad.  Try setting -O3 -march=native.  And try with
> > and
> > > > without fused-madd.
> > > >
> > > > On Fri, Apr 10, 2009 at 12:52 PM, Martin Ettl <ettl.martin@xxxxxx>
> > > wrote:
> > > >
> > > > > My result is as followd:
> > > > > ops: acc is 1.34155e+11 in 0.422607 seconds
> > > > > fma: acc is 1.34155e+11 in 2.88275 seconds
> > > > >
> > > > > i used the flags: -O2 -W -Wall -ansi -Wno-write-strings
> > > > > -fno-strict-aliasing  -mfused-madd
> > > > >
> > > > > What do i wrong?
> > > > >
> > > > >
> > > > > -------- Original-Nachricht --------
> > > > > > Datum: Fri, 10 Apr 2009 12:23:22 -0700
> > > > > > Von: Brian Budge <brian.budge@xxxxxxxxx>
> > > > > > An: Martin Ettl <ettl.martin@xxxxxx>
> > > > > > Betreff: Re: fma operation::Result
> > > > >
> > > > > > Your example was a bit complex.  I've attached a simpler one
> > (you'll
> > > > need
> > > > > > to
> > > > > > link with -lrt)
> > > > > >
> > > > > > My results:
> > > > > >
> > > > > > ops: acc is 1.34155e+11 in 0.325366 seconds
> > > > > > fma: acc is 1.34155e+11 in 0.302934 seconds
> > > > > >
> > > > > >
> > > > > > On Fri, Apr 10, 2009 at 12:03 PM, Brian Budge
> > <brian.budge@xxxxxxxxx
> > > >
> > > > > > wrote:
> > > > > >
> > > > > > > What are your compile options?
> > > > > > >
> > > > > > > My hunch is that your fma is not being inlined and so you are
> > > > incurring
> > > > > > > extra function call overhead
> > > > > > >
> > > > > > >
> > > > > > > On Fri, Apr 10, 2009 at 11:53 AM, Martin Ettl
> > <ettl.martin@xxxxxx>
> > > > > > wrote:
> > > > > > >
> > > > > > >> Hello,
> > > > > > >>
> > > > > > >> i have done as you suggested. But i wondering about the result
> > my
> > > > > > testcase
> > > > > > >> produced on my machine (Intel Core 2 Duo; gcc-4.3.3; Ubuntu
> > Linux
> > > > > > 8.10). The
> > > > > > >> fma() function call is ~50% slower than the expression
> (a*b)+c.
> > > > Could
> > > > > > that
> > > > > > >> be?
> > > > > > >> I have attached the testcase, to this mail. It is a small
> > programm
> > > > > > >> counting the processor cycles, the operation needs to execute.
> > My
> > > > > > output is
> > > > > > >> as followed:
> > > > > > >> #Iteration      Cycles using fma()
> > > > > > >> 0               3566020
> > > > > > >> 1               3501900
> > > > > > >> 2               3442240
> > > > > > >> 3               3481820
> > > > > > >> 4               3449920
> > > > > > >> #Iteration      Cycles NOT using fma()
> > > > > > >> 0               2160020
> > > > > > >> 1               2008120
> > > > > > >> 2               2002040
> > > > > > >> 3               2121120
> > > > > > >> 4               2028140
> > > > > > >>
> > > > > > >> Best regards
> > > > > > >>
> > > > > > >> Martin
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >>
> > > > > > >> -------- Original-Nachricht --------
> > > > > > >> > Datum: Fri, 10 Apr 2009 10:18:48 -0700
> > > > > > >> > Von: Brian Budge <brian.budge@xxxxxxxxx>
> > > > > > >> > An: Martin Ettl <ettl.martin@xxxxxx>
> > > > > > >> > CC: gcc-help@xxxxxxxxxxx
> > > > > > >> > Betreff: Re: fma operation
> > > > > > >>
> > > > > > >> > This will depend on your machine.  The way to know is to
> test
> > it
> > > > by
> > > > > > >> > calling
> > > > > > >> > these things in a giant loop (at least millions of times)
> and
> > > > using
> > > > > > >> > clock_gettime to time each loop.  So one giant loop with *
> > and
> > > +,
> > > > > and
> > > > > > >> > another with fma.  Unless your hardware has a special madd
> > type
> > > > > > >> > instruction,
> > > > > > >> > this will likely produce the exact same code.
> > > > > > >> >
> > > > > > >> > On Fri, Apr 10, 2009 at 9:37 AM, Martin Ettl <
> > > ettl.martin@xxxxxx>
> > > > > > >> wrote:
> > > > > > >> >
> > > > > > >> > > Hi,
> > > > > > >> > >
> > > > > > >> > > i am have made tests with the library version of
> > fma-function
> > > (
> > > > > > >> > > http://en.wikipedia.org/wiki/Multiply-accumulate).
> > > > > > >> > > I tested this code below on my linux machine. Now, my
> > question
> > > > is,
> > > > > > >> what
> > > > > > >> > > version is faster? Version (1) or (2)? How to determine,
> > that
> > > > this
> > > > > > >> > operation
> > > > > > >> > > is executed faster on my machine?
> > > > > > >> > > int main()
> > > > > > >> > > {
> > > > > > >> > >   double a=10.2;
> > > > > > >> > >   double b=12.;
> > > > > > >> > >   double c=9.;
> > > > > > >> > >   double e =a*b+c;   // (1)
> > > > > > >> > >   //double e =fma(a,b,c); // (2)
> > > > > > >> > > }
> > > > > > >> > >
> > > > > > >> > > Thanks in advance!
> > > > > > >> > >
> > > > > > >> > > Best regards
> > > > > > >> > >
> > > > > > >> > > Ettl Martin
> > > > > > >> > >
> > > > > > >> > > --
> > > > > > >> > > Psssst! Schon vom neuen GMX MultiMessenger gehört? Der
> > kann`s
> > > > mit
> > > > > > >> > allen:
> > > > > > >> > > http://www.gmx.net/de/go/multimessenger01
> > > > > > >> > >
> > > > > > >>
> > > > > > >> --
> > > > > > >> Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate +
> > > > > > >> Telefonanschluss für nur 17,95 Euro/mtl.!*
> > > > > > >>
> > > >
> http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a
> > > > > > >>
> > > > > > >
> > > > > > >
> > > > >
> > > > > --
> > > > > Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit
> > > > allen:
> > > > > http://www.gmx.net/de/go/multimessenger01
> > > > >
> > >
> > > --
> > > Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate +
> > > Telefonanschluss für nur 17,95 Euro/mtl.!*
> > > http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a
> > >
>
> --
> Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate +
> Telefonanschluss für nur 17,95 Euro/mtl.!*
> http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a
>

-- 
Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss für nur 17,95 Euro/mtl.!* http://dslspecial.gmx.de/freedsl-surfflat/?ac=OM.AD.PD003K11308T4569a