Autoconf manual's coverage of signed integer overflow & portability

Paul Eggert <eggert@xxxxxxxxxxx> · Tue, 02 Jan 2007 15:41:03 -0800

Today I updated the Autoconf manual to contain the following
description of the current situation with signed integer overflow.
This section of the manual is intended to advise programmers what to
do about portable C programs in this area.

I think some discussion along these lines also belongs in the GCC
manual.  As I understand it, though, some of the GCC developers are
loath to make any promises about GCC's behavior on signed overflow, so
the exact wording might be controversial.

Eventually I'd like to add better support for -fwrapv and the like to
Autoconf, but that can wait for further thought and experimentation.

Comments are welcome of course.

------

@node Integer Overflow
@section Integer Overflow
@cindex integer overflow
@cindex overflow, signed integer
@cindex signed integer overflow
@cindex wraparound arithmetic

Many portable C programs assume that signed integer overflow wraps
around reliably using two's complement arithmetic.  Yet the C standard
says that program behavior is undefined on overflow, and in a few cases
C programs do not work on some modern implementations because their
overflows do not wrap around as their authors intended.  Conversely, in
at least one common case related to overflow, the C standard requires
behavior that is commonly not implemented.

@menu
* Integer Overflow Basics::      Why integer overflow is a problem
* Signed Overflow Examples::     Examples of code assuming wraparound
* Optimization and Wraparound::  Optimizations that break uses of wraparound
* Signed Overflow Advice::       Practical advice for signed overflow issues
* Signed Integer Division::      @code{INT_MIN / -1} and @code{INT_MIN % -1}
@end menu

@node Integer Overflow Basics
@subsection Basics of Integer Overflow
@cindex integer overflow
@cindex overflow, signed integer
@cindex signed integer overflow
@cindex wraparound arithmetic

In languages like C, unsigned integer overflow reliably wraps around
modulo the word size.  This is guaranteed by the C standard and is
portable in practice, unless you specify aggressive optimization options
suitable only for special applications.

In contrast, the C standard says that signed integer overflow leads to
undefined behavior where a program can do anything, including dumping
core or overrunning a buffer.  The misbehavior can even precede the
overflow.  Such an overflow can occur during addition, subtraction,
multiplication, division, and left shift.

Despite this requirement of the standard, many C programs and Autoconf
tests assume that signed integer overflow silently wraps around modulo a
power of two, using two's complement arithmetic, so long as you cast the
resulting value to a signed integer type or store it into a signed
integer variable.  If you use conservative optimization flags, such
programs are generally portable to the vast majority of modern
platforms, with a few exceptions discussed later.

For historical reasons the C standard also allows implementations with
ones' complement or signed magnitude arithmetic, but it is safe to
assume two's complement nowadays.

@node Signed Overflow Examples
@subsection Examples of Code Assuming Wraparound Overflow
@cindex integer overflow
@cindex overflow, signed integer
@cindex signed integer overflow
@cindex wraparound arithmetic

There has long been a tension between what the C standard requires for
signed integer overflow, and what C programs commonly assume.  The
standard allows aggressive optimizations based on assumptions that
overflow never occurs, but many practical C programs rely on overflow
wrapping around.  These programs do not conform to the standard, but
they commonly work in practice because compiler writers are
understandably reluctant to implement optimizations that would break
many programs, unless perhaps a user specifies aggressive optimization.

The C Standard says that if a program has signed integer overflow its
behavior is undefined, and the undefined behavior can even precede the
overflow.  To take an extreme example:

@c Inspired by Robert Dewar's example in
@c <http://gcc.gnu.org/ml/gcc/2007-01/msg00038.html> (2007-01-01).
@example
if (password == expected_password)
  allow_superuser_privileges ();
else
  printf ("%d password mismatches\n", counter++);
@end example

@noindent
If @code{counter} is an @code{int} and a compiler can deduce that
@code{counter == INT_MAX} or that @code{counter} previously overflowed,
the C standard allows the compiler to optimize away the password test
and generate code that allows superuser privileges unconditionally.

Despite this requirement by the standard, it has long been common for C
code to assume wraparound arithmetic after signed overflow, and all
known practical C implementations support some C idioms that assume
wraparound signed arithmetic, even if the idioms does not conform
strictly to the standard.  If your code looks like the following
examples it will almost surely work with real-world compilers.

Here is an example derived from the 7th Edition Unix implementation of
@code{atoi} (1979-01-10):

@example
char *p;
int f, n;
@dots{}
while (*p >= '0' && *p <= '9')
  n = n * 10 + *p++ - '0';
return (f ? -n : n);
@end example

@noindent
Even if the input string is in range, on most modern machines this has
signed overflow when computing the most negative integer (the @code{-n}
overflows) or a value near an extreme integer (the first @code{+}
overflows).

Here is another example, taken from the 7th Edition implementation of
@code{rand} (1979-01-10).  Here the programmer expects both
multiplication and addition to wrap on overflow:

@example
static long int randx = 1;
@dots{}
randx = randx * 1103515245 + 12345;
return (randx >> 16) & 077777;
@end example

In the following example, derived from the @acronym{GNU} C Library 2.5
implementation of @code{mktime} (2006-09-09), the code assumes
wraparound arithmetic in @code{+} to detect signed overflow:

@example
time_t t, t1, t2;
int sec_requested, sec_adjustment;
@dots{}
t1 = t + sec_requested;
t2 = t1 + sec_adjustment;
if (((t1 < t) != (sec_requested < 0))
    | ((t2 < t1) != (sec_adjustment < 0)))
  return -1;
@end example

If your code looks like these examples, it is probably safe even though
it does not strictly conform to the C standard.  This might lead one to
believe that one can generally assume wraparound on overflow, but that
is not always true, as can be seen in the next section.

@node Optimization and Wraparound
@subsection Optimizations That Break Wraparound Arithmetic
@cindex loop induction

Compilers sometimes generate code that is incompatible with wraparound
integer arithmetic.  A simple example is an algebraic simplification: a
compiler might translate @code{(i * 2000) / 1000} to @code{i * 2}
because it assumes that @code{i * 2000} does not overflow.  The
translation is not equivalent to the original when overflow occurs:
e.g., in the typical case of 32-bit signed two's complement wraparound
@code{int}, if @code{i} has type @code{int} and value @code{1073742},
the original expression returns @minus{}2147483 but the optimized
version returns the mathematically correct value 2147484.

More subtly, loop induction optimizations often exploit the undefined
behavior of signed overflow.  Consider the following contrived function
@code{sumc}:

@example
int
sumc (int lo, int hi)
@{
  int sum = 0;
  int i;
  for (i = lo; i <= hi; i++)
    sum ^= i * 53;
  return sum;
@}
@end example

@noindent
To avoid multiplying by 53 each time through the loop, an optimizing
compiler might internally transform @code{sumc} to the equivalent of the
following:

@example
int
transformed_sumc (int lo, int hi)
@{
  int sum = 0;
  int hic = hi * 53;
  int ic;
  for (ic = lo * 53; ic <= hic; ic += 53)
    sum ^= ic;
  return sum;
@}
@end example

@noindent
This transformation is allowed by the C standard, but it is invalid for
wraparound arithmetic when @code{INT_MAX / 53 < hi}, because then the
overflow in computing expressions like @code{hi * 53} can cause the
expression @code{i <= hi} to yield a different value from the
transformed expression @code{ic <= hic}.

For this reason, compilers that use loop induction and similar
techniques often do not support reliable wraparound arithmetic when a
loop induction variable like @code{ic} is involved.  Since loop
induction variables are generated by the compiler, and are not visible
in the source code, it is not always trivial to say whether the problem
affects your code.

Hardly any code actually depends on wraparound arithmetic in cases like
these, so in practice these loop induction optimizations are almost
always useful.  However, edge cases in this area can cause problems.
For example:

@example
int j;
for (j = 1; 0 < j; j *= 2)
  test (j);
@end example

@noindent
Here, the loop attempts to iterate through all powers of 2 that
@code{int} can represent, but some test versions of @acronym{GCC}
optimize away the comparison to zero and thus generate an infinite loop,
under the argument that behavior is undefined on overflow.  As of this
writing this optimization is not done by any production version of
@acronym{GCC} with @option{-O2}, but it might be performed by more
aggressive @acronym{GCC} optimization options, or by other compilers.

@node Signed Overflow Advice
@subsection Practical Advice for Signed Overflow Issues
@cindex integer overflow
@cindex overflow, signed integer
@cindex signed integer overflow
@cindex wraparound arithmetic

Ideally the safest approach is to avoid signed integer overflow
entirely.  For example, instead of multiplying two signed integers, you
can convert them to unsigned integers, multiply the unsigned values,
then test whether the result is in signed range.

Rewriting code in this way will be inconvenient, though, particularly if
the signed values might be negative.  Also, it will probably hurt
performance.  Using unsigned arithmetic to check for overflow is
particularly painful to do portably and efficiently when dealing with an
integer type like @code{uid_t} whose width and signedness vary from
platform to platform.

Furthermore, many C applications pervasively assume wraparound behavior
and typically it is not easy to find and remove all these assumptions.
Hence it is often useful to maintain nonstandard code that assumes
wraparound on overflow, instead of rewriting the code.  The rest of this
section attempts to give practical advice for this situation.

If your code uses a signed loop index, make sure that the index cannot
overflow, along with all signed expressions derived from the index.
Here is a contrived example of problematic code with two instances of
overflow.

@example
for (i = INT_MAX - 10; i <= INT_MAX; i++)
  if (i + 1 < 0)
    @{
      report_overflow ();
      break;
    @}
@end example

@noindent
Because of the two overflows, a compiler might optimize away or
transform the two comparisons in a way that is incompatible with the
wraparound assumption.

If your code uses an expression like @code{(i * 2000) / 1000} and you
actually want the multiplication to wrap around reliably, put the
product into a temporary variable and divide that by 1000.  This
inhibits the algebraic optimization on many platforms.

If your code assumes wraparound behavior and you want to insulate it
against any @acronym{GCC} optimizations that would fail to support that
behavior, you should use @acronym{GCC}'s @option{-fwrapv} option, which
causes signed overflow to wrap around reliably (except for division and
remainder, as discussed in the next section).

If you need to port to platforms where signed integer overflow does not
reliably wrap around (e.g., due to hardware overflow checking, or to
highly aggressive optimizations), you should consider using
@acronym{GCC}'s @option{-ftrapv} option, which causes signed overflow to
raise an exception.

@node Signed Integer Division
@subsection Signed Integer Division and Integer Overflow
@cindex division, integer

Overflow in signed
integer division is not always harmless: for example, on CPUs of the
i386 family, dividing @code{INT_MIN} by @code{-1} yields a SIGFPE signal
which by default terminates the program.  Worse, taking the remainder
of these two values typically yields the same signal on these CPUs,
even though the C standard requires @code{INT_MIN % -1} to yield zero
because the expression does not overflow.

_______________________________________________
Autoconf mailing list
Autoconf@xxxxxxx
http://lists.gnu.org/mailman/listinfo/autoconf