Re: Optimisations and undefined behaviour

David Brown <david.brown@xxxxxxxxxxxx> · Tue, 10 Nov 2015 10:20:09 +0100

On 09/11/15 17:31, Andrew Haley wrote:
> On 11/09/2015 03:56 PM, David Brown wrote:
> 
>> We typically cannot use "sanatize" options, nor can we accept that a
>> bug in one part of the program causes undue and unnecessarily
>> damaging side-effects in other parts.
> 
> Well, you have to get used to that.  It is reality: computers work
> that way.  I'm sure you know that if you hit the wrong I/O port
> with a wild write odd things will happen.  Whether that's "undue" or
> "unnecessary" I couldn't say: it just is.

There is no doubt that with C, it is easy to make mistakes that can lead
to disaster.  One little typo, and you access an array outside of its
bounds - with the result of corrupting a completely independent piece of
data, showing up as a problem in a different part of the program that
was already tested and working fine.  These things are part of the risk
of C programming - it's the cost you pay for maximum speed and efficiency.

But let's look back at Annex L in the C11 standards, and ask what it is
/for/.  It is labelled "Analyzability" - the aim is to help developers
(and code reviewers, and static analysers) figure out if code is correct
or not.  Part of that is to place bounds on the effects of some types of
undefined behaviour.  Now, you can certainly argue that Annex L doesn't
go far enough to be useful, or it is too vague, or that it is
impractical or limiting to implement, or that knock-on effects can
quickly turn bounded undefined behaviour into unbounded undefined
behaviour.  But the very existence of this Annex in the C standards
shows how important these things are.

To my mind, if execution hits UB, then it's a bug in the program.  It
doesn't really matter if the bug is UB, or an error in the algorithm, or
a coding mistake, or even a mistake in the customer's requirement
specifications.  It's a point where the computer does not do what the
user expects it to do.  Is it so unreasonable to want code that is
correct up to that bug, to run correctly until the bug is hit?  Is it
unreasonable to want the compiler to warn if it sees such a bug, and
uses it to change the past action of the code?

I know that once code hits the bug, all sorts of things can go wrong.
All I ask is that the compiler should not make things worse - at least
not without informing the developer.

> 
> C definitely works that way.  Maybe there should be a nice small
> language which is useful for embedded developers and doesn't have
> all the interesting UB properties that C has.  (Ada, maybe?  Probably
> not.)  Maybe you could define a language compatible with C with the UB
> removed.  But defining the semantics of such a language would not be
> easy.  And I don't think it makes much sense to change GCC without
> such a rigorous language definition.
> 

I don't believe it is possible to make a general and efficient
programming language without UB.  And even if it could be eliminated at
the language level, it would still exist at a higher level - a square
root function could well have undefined behaviour for negative inputs.
I also am happy that gcc can make many small optimisations based on its
knowledge of UB - strength reduction and constant folding in integer
expressions is a key example.

But I can also see the potential for optimisations based on UB to make
developers' lives much more difficult by working logically backwards in
time.

(Note that in all my testing, I have not found gcc to perform any of
these unwanted "optimisations" that I fear - but I haven't found good
reason to be sure that they won't be performed in the future.)

And yes, I realise that taken to extremes, this is ultimately an
impossible task.  The point is merely to aim in that direction.  I don't
expect gcc to warn on every possible error - but I am happy to see it
warning on more and more possible errors.  What I don't want to see is
is that since it is impossible to know /exactly/ what the programmer
wanted when there is UB in the code, that means the compiler can refuse
all responsibility for trying.  That would IMHO be a disservice to the
user, making it harder to find and fix bugs.