Re: compile-time conversion of floating-point expressions to long longs

"Nelson H. F. Beebe" <beebe@xxxxxxxxxxxxx> · Mon, 8 May 2006 05:53:51 -0600 (MDT)

Jim Cromie <jim.cromie@xxxxxxxxx> asks on Sun, 07 May 2006 13:41:41 -0400
why gcc won't evaluate an initialization involving frexp() at compile time.

In general, compilers only permit initializations using constant
expressions that involve no external references.  This is mandated
by the ISO C Standards.  From ISO/IEC 9899:1999 (E):

>> ...
>>      6.6 Constant expressions
>>
>>      Syntax
>>
>> 1	       constant-expression:
>> 		       conditional-expression
>>
>>      Description
>>
>> 2    A constant expression can be evaluated during translation rather
>>      than runtime, and accordingly may be used in any place that a
>>      constant may be.
>>
>>      Constraints
>>
>> 3    Constant expressions shall not contain assignment, increment,
>>      decrement, function-call, or comma operators, except when they
                   ^^^^^^^^^^^^^ NOTE
>>      are contained within a subexpression that is not evaluated.
>>
>> 4    Each constant expression shall evaluate to a constant that is in
>>      the range of representable values for its type.
>> ...

Recall that it is common practice today to use shared libraries for
widely-used code.  Thus, an initialization like "const float root13 =
sqrt(13);" at outer level could have different meanings on different
runs or different systems of the same architecture, depending on which
version of the math library were used, and the accuracy of the
implementation of the sqrt() function.  Also, if the compiler is
cross-compiling for another architecture, it would not have access to
that other platform's run-time library, and thus could not compute the
expression.  Thus, the value is x is not really a constant as far as
machine implementations are concerned.

Historically, across CPU architectures, the meaning of "float", and
its precision, could change, although all modern machines use the IEEE
754 system, so that is no longer the case.  For example, I have a DEC
PDP-10 on my desktop, with a 36-bit word, and three bits more
precision in float than my other systems.

If you want to have initializations with mathematical expressions, the
correct way is to do one of two things:

(1) For C99 compilation, start with an accurate value of the
    expression evaluated in higher precision, perhaps via a symbolic
    algebra system with user-specifiable precision, and then express
    that as a hexadecimal constant.  For the above example, I might
    then write:

	const float root13 = +0x1.cd82b446159f360fedeccf37f9e5p+1F; /*	sqrt(13) */

    Some compilers will complain about the extra digits, but that
    doesn't matter: you'll get a correct-to-last-bit value for root13.

(2) For either C99 or pre-C99, express the constant as the sum of an
    exactly-representable value and a small correction.  For example,

	const float root13 = 945173.0F / 262144.0F + 2.4168214111681192212674704961299214e-06;

    Both 945173.0f and 262144.0F are exactly representable in the IEEE
    754 32-bit floating-point format, which has 24 bits of precision.
    Older floating-point systems always had at least that much
    precision for the float data type.  Importantly, 262144.0F is a
    power of two (2 to the 18), so the division is EXACT, since in the
    absence of underflow or overflow, multiplication or division by a
    power of the base simply adjusts the exponent, with changing the
    significand.   Thus, the first term is computed exactly at compile
    time.  The second term is a small correction, so even if the
    decimal-to-binary conversion is subject to small errors, they will be
    well beyond the precision of the result, and the stored constant
    computed at compile time will be correct to the last bit.

Technique (2) is common in carefully-written mathematical software.
As C99 compilers become more common, technique (1) will be used more
often.

It certainly is NOT sufficient to simply express the number in decimal
and assign it with, e.g.,

	const float root13 = 3.60555127546398929311922126747049613F;

The accuracy of decimal-to-binary conversion varies across systems,
and the result is that the constant could get values that differ by
one or two units in the last place from the exact value obtained by
either of the techniques shown above.

-------------------------------------------------------------------------------
- Nelson H. F. Beebe                    Tel: +1 801 581 5254                  -
- University of Utah                    FAX: +1 801 581 4148                  -
- Department of Mathematics, 110 LCB    Internet e-mail: beebe@xxxxxxxxxxxxx  -
- 155 S 1400 E RM 233                       beebe@xxxxxxx  beebe@xxxxxxxxxxxx -
- Salt Lake City, UT 84112-0090, USA    URL: http://www.math.utah.edu/~beebe/ -
-------------------------------------------------------------------------------