Re: GCC is 7 times slower than Intel? How to optimize? Need help!!

Arturs Zoldners <az@xxxxxxx> · Wed, 08 Oct 2008 21:30:19 +0300

Hi!

On Mon, 2008-10-06 at 23:23 -0700, jackfrost wrote: 
> I've tried  -ffast-math.
> with the same result.

On Tue, 2008-10-07 at 13:05 -0400, Michael Meissner wrote:
> IIRC, one of the spec 2006 suite is heavily dominated by calls to exp.
> So
> programs do exist that are dominated by the 'exp' function.

If you indeed need exponent, so it wasn't in your code sample by chance
and if you can tolerate limited argument range and limited precision,
you can try to use well known hack to calculate exponent faster, at
least on some hardware, like:

template<typename T> float fast_2pow(T arg)
{
    ASSERT(arg > -127);
    ASSERT(arg <  127);

    typedef union
    {
        unsigned int u;
        float        f;
    } uf_t;

    const uf_t x = {int(arg * (1 << 23)) + int(127 * (1 << 23))};
    //reset mantissa bits,  f: 2 ^ floor(arg)
    const uf_t exp = { x.u & ~((1 << 23) - 1)};
    //set exponent & sign bits, f: arg - floor(arg) + 1
    const uf_t man = {(x.u &  ((1 << 23) - 1)) | (127 << 23)}; 

    //return approximation
    return exp.f * ((man.f * man.f + 2.) * 0.3330234735869276f);
}

template<typename T> float fast_exp(T arg)
{
    static const T scale = 1./log(2.);
    return fast_2pow(arg * scale);
}

(template parameter is ment to be float or double, not integer :)

This 2-nd order approximation (continuous values, continuous derivative,
minimized squared relative error) gives relative error < 0.3% and is
~6.6 times faster than exp(..) on AMD Turion(tm) 64 X2, gcc 4.3.0,
compile switches: -march=native -msse2 -O3 -ffast-math -mfpmath=sse,
using your code sample:
http://gcc.gnu.org/ml/gcc-help/2008-10/msg00033.html

I have just tested 4-th order approximation. It gives relative error <
1e-5 and still is ~5.7 times faster than exp(..).

Regards,
Arturs Zoldners