Today I updated the Autoconf manual to contain the following description of the current situation with signed integer overflow. This section of the manual is intended to advise programmers what to do about portable C programs in this area. I think some discussion along these lines also belongs in the GCC manual. As I understand it, though, some of the GCC developers are loath to make any promises about GCC's behavior on signed overflow, so the exact wording might be controversial. Eventually I'd like to add better support for -fwrapv and the like to Autoconf, but that can wait for further thought and experimentation. Comments are welcome of course. ------ @node Integer Overflow @section Integer Overflow @cindex integer overflow @cindex overflow, signed integer @cindex signed integer overflow @cindex wraparound arithmetic Many portable C programs assume that signed integer overflow wraps around reliably using two's complement arithmetic. Yet the C standard says that program behavior is undefined on overflow, and in a few cases C programs do not work on some modern implementations because their overflows do not wrap around as their authors intended. Conversely, in at least one common case related to overflow, the C standard requires behavior that is commonly not implemented. @menu * Integer Overflow Basics:: Why integer overflow is a problem * Signed Overflow Examples:: Examples of code assuming wraparound * Optimization and Wraparound:: Optimizations that break uses of wraparound * Signed Overflow Advice:: Practical advice for signed overflow issues * Signed Integer Division:: @code{INT_MIN / -1} and @code{INT_MIN % -1} @end menu @node Integer Overflow Basics @subsection Basics of Integer Overflow @cindex integer overflow @cindex overflow, signed integer @cindex signed integer overflow @cindex wraparound arithmetic In languages like C, unsigned integer overflow reliably wraps around modulo the word size. This is guaranteed by the C standard and is portable in practice, unless you specify aggressive optimization options suitable only for special applications. In contrast, the C standard says that signed integer overflow leads to undefined behavior where a program can do anything, including dumping core or overrunning a buffer. The misbehavior can even precede the overflow. Such an overflow can occur during addition, subtraction, multiplication, division, and left shift. Despite this requirement of the standard, many C programs and Autoconf tests assume that signed integer overflow silently wraps around modulo a power of two, using two's complement arithmetic, so long as you cast the resulting value to a signed integer type or store it into a signed integer variable. If you use conservative optimization flags, such programs are generally portable to the vast majority of modern platforms, with a few exceptions discussed later. For historical reasons the C standard also allows implementations with ones' complement or signed magnitude arithmetic, but it is safe to assume two's complement nowadays. @node Signed Overflow Examples @subsection Examples of Code Assuming Wraparound Overflow @cindex integer overflow @cindex overflow, signed integer @cindex signed integer overflow @cindex wraparound arithmetic There has long been a tension between what the C standard requires for signed integer overflow, and what C programs commonly assume. The standard allows aggressive optimizations based on assumptions that overflow never occurs, but many practical C programs rely on overflow wrapping around. These programs do not conform to the standard, but they commonly work in practice because compiler writers are understandably reluctant to implement optimizations that would break many programs, unless perhaps a user specifies aggressive optimization. The C Standard says that if a program has signed integer overflow its behavior is undefined, and the undefined behavior can even precede the overflow. To take an extreme example: @c Inspired by Robert Dewar's example in @c <http://gcc.gnu.org/ml/gcc/2007-01/msg00038.html> (2007-01-01). @example if (password == expected_password) allow_superuser_privileges (); else printf ("%d password mismatches\n", counter++); @end example @noindent If @code{counter} is an @code{int} and a compiler can deduce that @code{counter == INT_MAX} or that @code{counter} previously overflowed, the C standard allows the compiler to optimize away the password test and generate code that allows superuser privileges unconditionally. Despite this requirement by the standard, it has long been common for C code to assume wraparound arithmetic after signed overflow, and all known practical C implementations support some C idioms that assume wraparound signed arithmetic, even if the idioms does not conform strictly to the standard. If your code looks like the following examples it will almost surely work with real-world compilers. Here is an example derived from the 7th Edition Unix implementation of @code{atoi} (1979-01-10): @example char *p; int f, n; @dots{} while (*p >= '0' && *p <= '9') n = n * 10 + *p++ - '0'; return (f ? -n : n); @end example @noindent Even if the input string is in range, on most modern machines this has signed overflow when computing the most negative integer (the @code{-n} overflows) or a value near an extreme integer (the first @code{+} overflows). Here is another example, taken from the 7th Edition implementation of @code{rand} (1979-01-10). Here the programmer expects both multiplication and addition to wrap on overflow: @example static long int randx = 1; @dots{} randx = randx * 1103515245 + 12345; return (randx >> 16) & 077777; @end example In the following example, derived from the @acronym{GNU} C Library 2.5 implementation of @code{mktime} (2006-09-09), the code assumes wraparound arithmetic in @code{+} to detect signed overflow: @example time_t t, t1, t2; int sec_requested, sec_adjustment; @dots{} t1 = t + sec_requested; t2 = t1 + sec_adjustment; if (((t1 < t) != (sec_requested < 0)) | ((t2 < t1) != (sec_adjustment < 0))) return -1; @end example If your code looks like these examples, it is probably safe even though it does not strictly conform to the C standard. This might lead one to believe that one can generally assume wraparound on overflow, but that is not always true, as can be seen in the next section. @node Optimization and Wraparound @subsection Optimizations That Break Wraparound Arithmetic @cindex loop induction Compilers sometimes generate code that is incompatible with wraparound integer arithmetic. A simple example is an algebraic simplification: a compiler might translate @code{(i * 2000) / 1000} to @code{i * 2} because it assumes that @code{i * 2000} does not overflow. The translation is not equivalent to the original when overflow occurs: e.g., in the typical case of 32-bit signed two's complement wraparound @code{int}, if @code{i} has type @code{int} and value @code{1073742}, the original expression returns @minus{}2147483 but the optimized version returns the mathematically correct value 2147484. More subtly, loop induction optimizations often exploit the undefined behavior of signed overflow. Consider the following contrived function @code{sumc}: @example int sumc (int lo, int hi) @{ int sum = 0; int i; for (i = lo; i <= hi; i++) sum ^= i * 53; return sum; @} @end example @noindent To avoid multiplying by 53 each time through the loop, an optimizing compiler might internally transform @code{sumc} to the equivalent of the following: @example int transformed_sumc (int lo, int hi) @{ int sum = 0; int hic = hi * 53; int ic; for (ic = lo * 53; ic <= hic; ic += 53) sum ^= ic; return sum; @} @end example @noindent This transformation is allowed by the C standard, but it is invalid for wraparound arithmetic when @code{INT_MAX / 53 < hi}, because then the overflow in computing expressions like @code{hi * 53} can cause the expression @code{i <= hi} to yield a different value from the transformed expression @code{ic <= hic}. For this reason, compilers that use loop induction and similar techniques often do not support reliable wraparound arithmetic when a loop induction variable like @code{ic} is involved. Since loop induction variables are generated by the compiler, and are not visible in the source code, it is not always trivial to say whether the problem affects your code. Hardly any code actually depends on wraparound arithmetic in cases like these, so in practice these loop induction optimizations are almost always useful. However, edge cases in this area can cause problems. For example: @example int j; for (j = 1; 0 < j; j *= 2) test (j); @end example @noindent Here, the loop attempts to iterate through all powers of 2 that @code{int} can represent, but some test versions of @acronym{GCC} optimize away the comparison to zero and thus generate an infinite loop, under the argument that behavior is undefined on overflow. As of this writing this optimization is not done by any production version of @acronym{GCC} with @option{-O2}, but it might be performed by more aggressive @acronym{GCC} optimization options, or by other compilers. @node Signed Overflow Advice @subsection Practical Advice for Signed Overflow Issues @cindex integer overflow @cindex overflow, signed integer @cindex signed integer overflow @cindex wraparound arithmetic Ideally the safest approach is to avoid signed integer overflow entirely. For example, instead of multiplying two signed integers, you can convert them to unsigned integers, multiply the unsigned values, then test whether the result is in signed range. Rewriting code in this way will be inconvenient, though, particularly if the signed values might be negative. Also, it will probably hurt performance. Using unsigned arithmetic to check for overflow is particularly painful to do portably and efficiently when dealing with an integer type like @code{uid_t} whose width and signedness vary from platform to platform. Furthermore, many C applications pervasively assume wraparound behavior and typically it is not easy to find and remove all these assumptions. Hence it is often useful to maintain nonstandard code that assumes wraparound on overflow, instead of rewriting the code. The rest of this section attempts to give practical advice for this situation. If your code uses a signed loop index, make sure that the index cannot overflow, along with all signed expressions derived from the index. Here is a contrived example of problematic code with two instances of overflow. @example for (i = INT_MAX - 10; i <= INT_MAX; i++) if (i + 1 < 0) @{ report_overflow (); break; @} @end example @noindent Because of the two overflows, a compiler might optimize away or transform the two comparisons in a way that is incompatible with the wraparound assumption. If your code uses an expression like @code{(i * 2000) / 1000} and you actually want the multiplication to wrap around reliably, put the product into a temporary variable and divide that by 1000. This inhibits the algebraic optimization on many platforms. If your code assumes wraparound behavior and you want to insulate it against any @acronym{GCC} optimizations that would fail to support that behavior, you should use @acronym{GCC}'s @option{-fwrapv} option, which causes signed overflow to wrap around reliably (except for division and remainder, as discussed in the next section). If you need to port to platforms where signed integer overflow does not reliably wrap around (e.g., due to hardware overflow checking, or to highly aggressive optimizations), you should consider using @acronym{GCC}'s @option{-ftrapv} option, which causes signed overflow to raise an exception. @node Signed Integer Division @subsection Signed Integer Division and Integer Overflow @cindex division, integer Overflow in signed integer division is not always harmless: for example, on CPUs of the i386 family, dividing @code{INT_MIN} by @code{-1} yields a SIGFPE signal which by default terminates the program. Worse, taking the remainder of these two values typically yields the same signal on these CPUs, even though the C standard requires @code{INT_MIN % -1} to yield zero because the expression does not overflow. _______________________________________________ Autoconf mailing list Autoconf@xxxxxxx http://lists.gnu.org/mailman/listinfo/autoconf