Re: Optimisations and undefined behaviour

David Brown <david.brown@xxxxxxxxxxxx> · Fri, 06 Nov 2015 15:44:37 +0100

On 06/11/15 14:42, Andrew Haley wrote:
> On 11/06/2015 12:32 PM, David Brown wrote:
> 
>> Is it at least possible to put some restrictions or guarantees in
>> how far the compiler will go in its optimisations, or how far it
>> will go without warnings? 
> 
> AFAICS that is the intention of Appendix L.  I don't know how to bound
> undefined behaviour in the way it suggests; it may turn out to be
> difficult.
> 
>> I don't want to limit the generation of code, I want it to be harder
>> for there to be a mismatch between what the compiler generates and
>> what the programmer intended.
> 
> I don't think you can have both.

I think it is possible - warnings (especially enabled by -Wall, rather
than needing explicit options) can warn when optimisation changes code
in certain ways.  But I can certainly see that minimising false
positives and false negatives is extremely difficult.

> 
>> And when there /is/ a mismatch, I want it to be easier to find -
>> applying undefined behaviour backwards in the code can make it
>> extremely difficult for the programmer to find the real problem.
> 
> The undefinedness of integer overflow allows us to transform
> 
>    x*3/3 -> x
> 
> It's not a bug, it's a feature.

I agree.  It is a useful feature, allowing such transforms.

However, when it is also used to transform other code logically before
the line with (potentially) undefined behaviour, the usefulness becomes
less clear - and at the very least, warnings would help programmers get
their code right.

> 
>> This is merely an example - there are many other possible undefined
>> behaviours.  And while it may be a simple solution to say "it's the
>> programmer's fault - /skilled/ programmers would not do that", the fact
>> is that there are many programmers out there who are not that skilled.
>> The Principle of Least Astonishment is important here - the compiler
>> should avoid surprising programmers.
> 
> C is a particular language which has been specified in a particular
> way in order to allow these kinds of optimizations.  If people don't
> want that kind of language there are plenty of others, most of which
> don't have the same properties.
> 
> I really think you are talking to the wrong people.  If you want to
> control what optimizations are allowed in a C implementation you
> should appeal to the C standard technical committe, not compiler
> writers.  Appendix L is interesting, and perhaps is a way forward for
> those who agree with you.
> 

I am not sure I want to /control/ the kind of optimisations allowed in C
- it's more that I want to understand them, and in particular I want to
understand how they may be implemented in future versions of real-world
compilers (gcc in particular).  There is no doubt that compilers have
been getting smarter (for which the gcc developers deserve praise and
thanks) - but it opens more possibilities for people to write code that
they think is correct, but is actually wrong.  So I believe I am talking
to the correct people, although I might not be expressing myself very
well - I am interested in the implementations of C here, rather than the
standards.  It is the implementations that have made more aggressive use
of undefined behaviour in recent years - the standards haven't changed
in this respect (except for Annex L).

Incidentally, what do you think of the possibility of a "dont_care()"
expression?  I believe it would allow programmers to be clearer to the
compiler that they are not interested in the exact results, but they
don't want /undefined/ behaviour.

We have agreed that the compiler can legally transform :

int foo(int x) {
	if (x < 5000) bar(x);
	return x * x * x;
}

into :

int foo(int x) {
	bar(x);
	return x * x * x;
}

gcc doesn't do that at the moment, and I hope that if and when it allows
such optimisations, then it will be able to give a warning when doing
so, because it can cause such unexpected behaviour for the programmer
(along with code reviewers or static analysis tools - suddenly it is now
possible for bar to be called with x >= 5000).

A safe alternative, assuming (as per original spec in my first post) we
don't care what we get if x*x*x is not valid, would be:

int foo(int x) {
	if (x < 5000) bar(x);
	if (x > 1290) return 0;
	return x * x * x;
}

But that means extra generated code - and the whole point is to generate
as efficient code as possible.

With a dont_care() expression, we could have:

int foo(int x) {
	if (x < 5000) bar(x);
	if (x > 1290) return dont_care();
	return x * x * x;
}

dont_care() would probably have to be a builtin - I can't think of any
way to produce this effect without it.  It would be an expression
somewhat similar to __builtin_unreachable().

With the dont_care(), the compiler could happily generate the x*x*x
using a cpu's multiply instruction, knowing that the result is good
enough for the user.  But we have turned the undefined behaviour into an
unspecified behaviour - we want an int and don't care which one, but
nasal daemons and eliminating the check before bar() is no longer possible.

(I know there are other ways to implement this particular function
safely, including casts to long long, separating the sign bit from x and
using unsigned types, using __builtin_mul_overflow, etc.  But these all
get ugly very quickly, and may not lead to efficient object code.)

Many thanks for your time and thoughts on this (as well as on the
compiler itself).