On 06/11/15 14:42, Andrew Haley wrote: > On 11/06/2015 12:32 PM, David Brown wrote: > >> Is it at least possible to put some restrictions or guarantees in >> how far the compiler will go in its optimisations, or how far it >> will go without warnings? > > AFAICS that is the intention of Appendix L. I don't know how to bound > undefined behaviour in the way it suggests; it may turn out to be > difficult. > >> I don't want to limit the generation of code, I want it to be harder >> for there to be a mismatch between what the compiler generates and >> what the programmer intended. > > I don't think you can have both. I think it is possible - warnings (especially enabled by -Wall, rather than needing explicit options) can warn when optimisation changes code in certain ways. But I can certainly see that minimising false positives and false negatives is extremely difficult. > >> And when there /is/ a mismatch, I want it to be easier to find - >> applying undefined behaviour backwards in the code can make it >> extremely difficult for the programmer to find the real problem. > > The undefinedness of integer overflow allows us to transform > > x*3/3 -> x > > It's not a bug, it's a feature. I agree. It is a useful feature, allowing such transforms. However, when it is also used to transform other code logically before the line with (potentially) undefined behaviour, the usefulness becomes less clear - and at the very least, warnings would help programmers get their code right. > >> This is merely an example - there are many other possible undefined >> behaviours. And while it may be a simple solution to say "it's the >> programmer's fault - /skilled/ programmers would not do that", the fact >> is that there are many programmers out there who are not that skilled. >> The Principle of Least Astonishment is important here - the compiler >> should avoid surprising programmers. > > C is a particular language which has been specified in a particular > way in order to allow these kinds of optimizations. If people don't > want that kind of language there are plenty of others, most of which > don't have the same properties. > > I really think you are talking to the wrong people. If you want to > control what optimizations are allowed in a C implementation you > should appeal to the C standard technical committe, not compiler > writers. Appendix L is interesting, and perhaps is a way forward for > those who agree with you. > I am not sure I want to /control/ the kind of optimisations allowed in C - it's more that I want to understand them, and in particular I want to understand how they may be implemented in future versions of real-world compilers (gcc in particular). There is no doubt that compilers have been getting smarter (for which the gcc developers deserve praise and thanks) - but it opens more possibilities for people to write code that they think is correct, but is actually wrong. So I believe I am talking to the correct people, although I might not be expressing myself very well - I am interested in the implementations of C here, rather than the standards. It is the implementations that have made more aggressive use of undefined behaviour in recent years - the standards haven't changed in this respect (except for Annex L). Incidentally, what do you think of the possibility of a "dont_care()" expression? I believe it would allow programmers to be clearer to the compiler that they are not interested in the exact results, but they don't want /undefined/ behaviour. We have agreed that the compiler can legally transform : int foo(int x) { if (x < 5000) bar(x); return x * x * x; } into : int foo(int x) { bar(x); return x * x * x; } gcc doesn't do that at the moment, and I hope that if and when it allows such optimisations, then it will be able to give a warning when doing so, because it can cause such unexpected behaviour for the programmer (along with code reviewers or static analysis tools - suddenly it is now possible for bar to be called with x >= 5000). A safe alternative, assuming (as per original spec in my first post) we don't care what we get if x*x*x is not valid, would be: int foo(int x) { if (x < 5000) bar(x); if (x > 1290) return 0; return x * x * x; } But that means extra generated code - and the whole point is to generate as efficient code as possible. With a dont_care() expression, we could have: int foo(int x) { if (x < 5000) bar(x); if (x > 1290) return dont_care(); return x * x * x; } dont_care() would probably have to be a builtin - I can't think of any way to produce this effect without it. It would be an expression somewhat similar to __builtin_unreachable(). With the dont_care(), the compiler could happily generate the x*x*x using a cpu's multiply instruction, knowing that the result is good enough for the user. But we have turned the undefined behaviour into an unspecified behaviour - we want an int and don't care which one, but nasal daemons and eliminating the check before bar() is no longer possible. (I know there are other ways to implement this particular function safely, including casts to long long, separating the sign bit from x and using unsigned types, using __builtin_mul_overflow, etc. But these all get ugly very quickly, and may not lead to efficient object code.) Many thanks for your time and thoughts on this (as well as on the compiler itself).