Re: Optimisations and undefined behaviour

David Brown <david.brown@xxxxxxxxxxxx> · Fri, 06 Nov 2015 22:46:04 +0100

On 06/11/15 21:56, Rena wrote:
On Nov 6, 2015 2:11 PM, "David Brown" <david.brown@xxxxxxxxxxxx
<mailto:david.brown@xxxxxxxxxxxx>> wrote:
 >
 > On 06/11/15 16:48, Andrew Haley wrote:
 >>
 >> On 11/06/2015 02:44 PM, David Brown wrote:
 >
 >
 >>> I am not sure I want to /control/ the kind of optimisations allowed
 >>> in C - it's more that I want to understand them, and in particular I
 >>> want to understand how they may be implemented in future versions of
 >>> real-world compilers (gcc in particular).
 >>
 >>
 >> Anything not excluded by the standard may be implemented.  There is no
 >> other safe assumption.  From time to time compiler writers choose not
 >> to implement a particular optimization because it doesn't gain much
 >> and it's too likely to break code.  But speaking for myself, I'm happy
 >> to fix UB bugs in my code and let the compiler optimize as much as
 >> possible.
 >
 >
 > I don't think that is unreasonable.  I certainly think that fixing
any UB bugs in the users' code is the best possibility, and I'm glad gcc
is getting steadily better at helping here (recent versions have had
many new and helpful warnings, including several for UB, as well as the
runtime sanatization options).
 >
 > My aim with this discussion here was really to get an idea of how gcc
developers feel about these things.  My inspiration was from some posts
in comp.lang.c newsgroup, from others who feel that C compilers should
not try to be "too smart" and that aggressive optimisations "risk
breaking perfectly good code" (in their eyes).  I guess that is why you
still have "-O0" and "-O1" optimisation levels!
 >
 > Personally, I do agree that the compiler should be free to exploit
undefined behaviour in many ways - I just also want developers
(including myself) to have the best chances of spotting code errors as
soon as possible, and to avoid errors in one part of the code
manifesting themselves as symptoms in apparently unrelated areas of code.
 >
 >
 >>
 >>> There is no doubt that compilers have been getting smarter (for
 >>> which the gcc developers deserve praise and thanks) - but it opens
 >>> more possibilities for people to write code that they think is
 >>> correct, but is actually wrong.  So I believe I am talking to the
 >>> correct people, although I might not be expressing myself very well
 >>> - I am interested in the implementations of C here, rather than the
 >>> standards.  It is the implementations that have made more aggressive
 >>> use of undefined behaviour in recent years - the standards haven't
 >>> changed in this respect (except for Annex L).
 >>
 >>
 >> But if you write in C you won't be affected at all.  It's only people
 >> writing undefined code who will be affected by the issues we have
 >> discussed.  And the correct solution to your problem, IMHO, would be
 >> to redefine "undefined" in a way that excludes the behaviour that you
 >> think people find unexpected.  But again, I think you've got the wrong
 >> language: there are other languages without these pitfalls.
 >
 >
 > I know /I/ have the right language (well, I also use C++ a little -
but everything that applies to C here also applies to C++), and the
right compiler.  But certainly not everyone who writes in C has the
right language - I think many who use it would be better using a
different language, for all sorts of reasons, including UB, bounds and
range checked, etc.  (Many may also feel safer using "-O1 -fwrapv -Wall
-Wextra" - in my experience, a lot of users are unaware of most gcc flags.)
 >
 >
 >>
 >>> Incidentally, what do you think of the possibility of a "dont_care()"
 >>> expression?  I believe it would allow programmers to be clearer to the
 >>> compiler that they are not interested in the exact results, but they
 >>> don't want /undefined/ behaviour.
 >>
 >>
 >> It would be interesting to see if such a thing can be defined in a
 >> semantically rigorous way.  And explained to "ordinary" programmers!
 >> :-)
 >>
 >
 > I have sometimes used a cast to void as a way of telling the compiler
that I know a value is not being used (such as after reading from a
volatile hardware register in a microcontroller).  What we need here is
a cast /from/ void!
 >
 > Anyway, thanks for your time and thoughts.  I believe I have the
answers I was looking for at the moment.
 >
 > David
 >
 >

Maybe "x = void" could be a way to say "I don't care what the value of x
is" (and likewise "return void" for "I don't care what the return value
is"). But this is something to discuss at the standards level, not the
compiler level.

I don't expect gcc to actually implement "x = void" (unless the C 
standards introduce it, which is unlikely) - a __builtin_dont_care() 
function would be a lot more likely.  But as Andrew says, it's not clear 
exactly how it would be specified.

Regarding specific cases of undefined behavior such as integer overflow,
is there/could there be a way to specify the desired behavior without
having to explicitly write it every time? Like "if any integer overflow
occurs in this block, let it wrap around / truncate it / assume I don't
care about the result". I'm thinking of emulator code that relies on the
wrap around behavior, and would get a lot uglier and potentially slower
if it had to explicitly handle overflow after every operation.

gcc has a way to get this - the "-fwrapv" flag.  I believe that also 
works as an optimize attribute for individual functions, though I 
haven't tested it.

Maybe that could even be a type specification; eg:
wraparound int x; //use wrap behavior if x overflows
truncate int y; //truncate y if it overflows
int z; //let z overflowing be undefined behavior as usual

But again this would have to be a standard feature.

It would not have to be a standard feature - gcc could certainly 
implement something like this.  My understanding is that the gcc 
developers prefer these things to be in the standards, or at least in 
drafts for extensions and enhancements (like named address spaces, or 
fixed point types).  But they can also be implemented as type or 
variable attributes (such as different "modes").

Personally, I'd love to see this sort of thing in C - preferably 
standardised, but a gcc extension would be fine.  There can be times 
when you want integer overflow to be defined in some way, such as 
wrapping or saturation (I think that is what you mean by "truncate"), 
and for some processors this would fit well with the underlying 
architecture.  But it would not be easy to see how these would work in 
different circumstances - in particular, it would be difficult to figure 
out what would happen when mixing types.  How would you add a wrapping 
int to a saturating int?

I suspect the only practical way here is to use C++ classes, using tools 
such as the overflow builtins or even inline assembly to ensure optimal 
code if gcc can't figure it out by itself.  Then all the interaction 
rules will be clearly defined in the classes.