On 05/11/15 20:52, David Brown wrote: > I would like to know where the gcc developers stand on this - how can > the rules of the standards be interpreted? And how /should/ they be > interpreted? Does gcc aim for the most efficient code, or for the > principle of least astonishment? > > For example, consider this requirement: > > Write a function "int foo(int x)" that will call "bar(x)" if x is less > than 50000. Then if x is less than 1000, return x*x*x - otherwise > return any integer. (Assume 32-bit int's, no traps or other "funny > stuff" on the target.) > > It seems reasonable to write: > > int foo(int x) { > if (x < 50000) bar(x); > return x*x*x; > } > > For x greater than 1290, the cubing is going to overflow and generate a > mathematically incorrect answer. But that's okay - we don't care about > the result in such cases. > > The question is, can the compiler reason like this: > > If x > 1290, then undefined behaviour will occur. The user clearly > doesn't care what happens when x > 1290. Therefore, we can assume that > x <= 1290. Therefore, x < 50000. Therefore, there is no need to check > the conditional, and bar(x) can be called directly. That's right. > The point here is that if the code hits undefined behaviour, then the > behaviour of the entire program is undefined. Thus the undefined > behaviour in "x*x*x" can work backwards. > > Equally clearly, this would be against the expectations of the > programmer. Well, with UB it's not really possible to know the expectations of the programmer, and I'm not sure how much sense it makes to even try. > And any attempt to avoid the undefined behaviour (such as > changing the last line to "return (x < 1000) ? x*x*x : 0;") would lead > to bigger and slower code than the "obvious" compilation of foo() above. > > (In tests, gcc and clang both keep the check on x < 50000. > <https://gcc.godbolt.org/> is really handy!) > > How is the programmer to be sure that the compiler does not apply this > optimisation logic? If your code may overflow, use an unsigned type. > Or consider this code: > > char arr[8192]; > int foo2(int i) { > if (i < 0) return -1; > i *= 3; > if (i < 0) return -2; > if (i >= 8192) return -3; > return arr[i]++; > } > > The compiler can assume that since integer overflow is undefined > behaviour, and it knows that i >= 0 after the first check, then the > result of "i *= 3" must be non-negative. Indeed it can. > However, if you pass an integer over (2^31 / 3) to the function, it > is likely that after the assembly code for i *= 3 is run, the value > in the cpu register will be negative You're thinking about this in the wrong way. C is not defined in terms of registers: it is defined by an abstract machine. It terms of the abstract machine, integer overflow is undefined. Perhaps the processor overheats. > and the obvious assembly for the "i >= 8192" test will not trigger. > The result is access (and modification) to memory well outside the > bounds of arr, despite the clear safety check in the code. The > compiler would be allowed to do this even if it follows the > Analyzability Annex L in C11 standards - and it would be turning a > "bounded undefined behaviour" (integer overflow) into "unbounded > undefined behaviour" (modification of memory outside of bounds). Indeed so. Appendix L is interesting, but ensuring that integer overflow does not result in wild stores will defeat some optimizations. > In this second example, the skipped tests come /after/ the undefined > behaviour, but it could still easily be seen as unexpected behaviour > from the compiler. Unexpected by who? Skilled C programmers know that signed types have this property, but unsigned types wrap. Andrew.