On 10/11/15 10:44, David Brown wrote: > On 10/11/15 11:16, Richard Earnshaw wrote: >> On 10/11/15 09:20, David Brown wrote: >>> On 09/11/15 17:31, Andrew Haley wrote: >>>> On 11/09/2015 03:56 PM, David Brown wrote: >>>> >>>>> We typically cannot use "sanatize" options, nor can we accept that a >>>>> bug in one part of the program causes undue and unnecessarily >>>>> damaging side-effects in other parts. >>>> >>>> Well, you have to get used to that. It is reality: computers work >>>> that way. I'm sure you know that if you hit the wrong I/O port >>>> with a wild write odd things will happen. Whether that's "undue" or >>>> "unnecessary" I couldn't say: it just is. >>> >>> There is no doubt that with C, it is easy to make mistakes that can lead >>> to disaster. One little typo, and you access an array outside of its >>> bounds - with the result of corrupting a completely independent piece of >>> data, showing up as a problem in a different part of the program that >>> was already tested and working fine. These things are part of the risk >>> of C programming - it's the cost you pay for maximum speed and efficiency. >>> >>> But let's look back at Annex L in the C11 standards, and ask what it is >>> /for/. It is labelled "Analyzability" - the aim is to help developers >>> (and code reviewers, and static analysers) figure out if code is correct >>> or not. Part of that is to place bounds on the effects of some types of >>> undefined behaviour. Now, you can certainly argue that Annex L doesn't >>> go far enough to be useful, or it is too vague, or that it is >>> impractical or limiting to implement, or that knock-on effects can >>> quickly turn bounded undefined behaviour into unbounded undefined >>> behaviour. But the very existence of this Annex in the C standards >>> shows how important these things are. >>> >>> To my mind, if execution hits UB, then it's a bug in the program. It >>> doesn't really matter if the bug is UB, or an error in the algorithm, or >>> a coding mistake, or even a mistake in the customer's requirement >>> specifications. It's a point where the computer does not do what the >>> user expects it to do. Is it so unreasonable to want code that is >>> correct up to that bug, to run correctly until the bug is hit? Is it >>> unreasonable to want the compiler to warn if it sees such a bug, and >>> uses it to change the past action of the code? >>> >>> I know that once code hits the bug, all sorts of things can go wrong. >>> All I ask is that the compiler should not make things worse - at least >>> not without informing the developer. >>> >>> >>>> >>>> C definitely works that way. Maybe there should be a nice small >>>> language which is useful for embedded developers and doesn't have >>>> all the interesting UB properties that C has. (Ada, maybe? Probably >>>> not.) Maybe you could define a language compatible with C with the UB >>>> removed. But defining the semantics of such a language would not be >>>> easy. And I don't think it makes much sense to change GCC without >>>> such a rigorous language definition. >>>> >>> >>> I don't believe it is possible to make a general and efficient >>> programming language without UB. And even if it could be eliminated at >>> the language level, it would still exist at a higher level - a square >>> root function could well have undefined behaviour for negative inputs. >>> I also am happy that gcc can make many small optimisations based on its >>> knowledge of UB - strength reduction and constant folding in integer >>> expressions is a key example. >>> >>> But I can also see the potential for optimisations based on UB to make >>> developers' lives much more difficult by working logically backwards in >>> time. >>> >> >> You have to remember that a lot (but by no means all) of UB in C comes >> from the desire to have an efficient language that is portable to many >> architectures. C has to support 'int' of unspecified maximum size, >> machines that have two's complement, one's complement and sign-magnitude >> arithmetic, multiple different floating point formats. The way the >> language committee has enabled such support without requiring the tools >> to produce grossly inefficient code is to say that if you step outside >> the limits of the machine you are running on, then the behaviour is >> undefined. However, it didn't specifically qualify what had to happen >> in those cases and didn't distinguish them from other cases where UB can >> occur; it just said, if this happens your program is broken. >> Furthermore, it said quite clearly that compilers do not HAVE to detect >> such brokenness - because in the extreme cases that would be very >> expensive or even impossible to prove. > > I understand that - and I agree with the principle behind the use of UB > in the C standards. (I think perhaps some of the outdated hardware, > such as support for anything other than two's complement signed > integers, could be depreciated in newer C standards - but that's another > issue.) > > However, just because the C standards make certain behaviour undefined, > does not mean that C /compilers/ must leave it that way. In particular, > some undefined behaviour situations could usefully be turned into > unspecified behaviour while retaining much of the optimisation > flexibility, but with less of the danger. I think Annex L could be > described in that way. Some types of undefined behaviour, such as > accessing arrays out of bounds, cannot reasonably be bounded or limited > in any way - you may be writing over some other data, and the results > are completely unpredictable. > > But consider if integer overflow were made "unspecified" rather than > "undefined" - i.e., it were guaranteed to return a valid integer, but > you know nothing about /which/ integer it is. For efficiency, that > would require that the hardware does not have enforced traps on > overflow. But the compiler is still able to optimise "x * 3 / 6" to "x > / 2", and still able to assume that "x + 1 > x". But given "x = 1290; > foo(); y = x * x * x;" it is no longer able to eliminate the call to "foo". > I /think/ you've just said that x86 can no-longer use its divide instruction. That instruction faults if you have n/0 or INT_MIN/-1. R. > (It is quite possible that my analysis above only applies in some > circumstances, not others - I haven't attempted to think through a range > of cases, but merely to give an example of how it is possible to > generate efficient code without surprising the programmer /too/ much.) > >> >> (I would add at this point, that it is also very hard at times to prove >> something would invoke UB, so the ability to emit a diagnostic could >> depend on previous optimizations done on the code before the check is >> performed -- you only have to look at the history of >> -Wmaybe-uninitialized to realize how hard it is to get things like this >> right; and false positives can be as painful for developers as missing >> warnings or warnings that seem to appear and disappear at random). > > I appreciate that this is very hard, and that giving a useful balance > between warnings with few false positives is often a difficult process. > >> >>> (Note that in all my testing, I have not found gcc to perform any of >>> these unwanted "optimisations" that I fear - but I haven't found good >>> reason to be sure that they won't be performed in the future.) >>> >> >> And I doubt we would ever make such a guarantee. It might severely >> hamper GCC's stated goal of being a production-quality optimizing compiler. > > gcc is already a highly optimising compiler - if it is not "production > quality", then I don't know any compiler that /is/ "production quality". > And I support the goal of making gcc ever better. > > But the goal of efficient code generation must not be the only one, and > gcc must not lose sight of other important goals. For the programmer, > writing /correct/ code must always outweigh /fast/ code - and the > compiler should be a tool to aid the programmer. > >> >>> >>> And yes, I realise that taken to extremes, this is ultimately an >>> impossible task. The point is merely to aim in that direction. I don't >>> expect gcc to warn on every possible error - but I am happy to see it >>> warning on more and more possible errors. What I don't want to see is >>> is that since it is impossible to know /exactly/ what the programmer >>> wanted when there is UB in the code, that means the compiler can refuse >>> all responsibility for trying. That would IMHO be a disservice to the >>> user, making it harder to find and fix bugs. >> > >