On Sun, Feb 27, 2022 at 12:22 PM Segher Boessenkool <segher@xxxxxxxxxxxxxxxxxxx> wrote: > > Requiring to annotate every place that has UB (or *can* have UB!) by the > user is even less friendly than having so much UB is already :-( Yeah, I don't think that's the solution. In fact, I don't think that's even practically the _issue_. Honestly, a lot of "undefined behavior" in C is quite often of the kind "the programmer knows what he wants". Things like word size or byte order issues etc are classic "undefined behavior" in the sense that the C compiler really doesn't understand them. The C compiler won't silently fix any silly behavior you get from writing files in native byte order, and them not working on other platforms. Same goes for things like memory allocators - they often need to do things that the standard doesn't really cover, and shouldn't even *try* to cover. It's very much a core example of where people do odd pointer arithmetic and change the type of pointers. The problem with the C model of "undefined behavior" is not that the behavior ends up being architecture-specific and depending on various in-memory (or in-register) representation of the data. No, those things are often very much intentional (even if in the case of byte order, the "intention" may be that the programmer simply does not care, and "knows" that all the world is little endian). If the C compiler just generates reliable code that can sanely be debugged - including very much using tools that look for "hey, this behavior can be surprising", ie all those "look for bad patterns at run-time", then that would be 100% FINE. But the problem with the C notion of undefined behavior is NOT that "results can depend on memory layout and other architecture issues that the compiler doesn't understand". No, the problem is that the C standards people - and compiler people - have then declared that "because this can be surprising, and the compiler doesn't understand what is going on, now the compiler can do something *else* entirely". THAT is the problem. The classic case - and my personal "beat a dead horse" - is the completely broken type-based aliasing rules. The standards people literally said "the compiler doesn't understand this, it can expose byte order dependencies that shouldn't be explained, SO THE COMPILER CAN NOW DO SOMETHING COMPLETELY INSANE INSTEAD". And compiler people? They rushed out to do completely broken garbage - at least some of them did. You can literally find compiler people who see code like this (very traditional, traditionally very valid and common, although): // Return the least significant 16 bits of 'a' on LE machines #define low_16_bits(x) (*(unsigned short *)&(x)) and say "oh, because that's undefined, I can now decide to not do what the programmer told me to do AT ALL". Note that the above wasn't actually even badly defined originally. It was well-defined, it was used, and it was done by programmers that knew what they were doing. And then the C standards people decided that "because our job isn't to describe all the architectural issues you can hit, we'll call it undefined, and in the process let compiler people intentionally break it". THAT is a problem. Undefined results are are often intentional in system software - or they can be debugged using smart tools (including possibly very expensive run-time code generation) that actively look for them. But compilers that randomly do crazy things because the standard was bad? That's just broken. If compilers treated "undefined" as the same as "implementation-defined, but not explicitly documented", then that would be fine. But the C standards people - and a lot of compiler people - really don't seem to understand the problems they caused. And, btw, caused for no actual good reason. The HPC people who wanted Fortran-style aliasing could easily have had that with an extension. Yes, "restrict" is kind of a crappy one, but it could have been improved upon. Instead, people said "let's just break the language". Same exact thing goes for signed integer overflow. Linus