Re: Problem with debugging -m32 program

Xi Ruoyao <ryxi@xxxxxxxxxxxxxxxxx> · Fri, 13 Jul 2018 16:27:27 +0800

On 2018-07-12 15:38 -0400, Ignitus Boyone wrote:
> I believe the definition of undefined behavior is simply. Not defined in the C/C++ specification. 
> 
> https://en.cppreference.com/w/cpp/language/ub
> 
> This means that the implementation is responsible for diving what should be done instead of the spec. Often
> undefined behavior is just where the road ends, because you shouldn’t do it in the first place. 

>From ISO/IEC 9899:1999 3.4.3 para 1:

> *undefined behavior*
> behavior, upon use of a nonportable or erroneous program construct
> or of erroneous data, for which this International Standard imposes
> no requirements

There are "no requirements".  Implementation can just assume that
there are no undefined behaviors in a program.

> I feel a major point of this thread is to erase the idea that undefined means unpredictable. 

No.  Undefined behavior is *totally* unpredictable.

> Writing the variable using ptr arithmetic is very predictable, but because when you go past the bounds you might
> overwrite any number of things.

No.  It is *not* predictable.  When you see such a snip of code:

    int bar(char *y);

    int foo(const char *y)
    {
        int i;
        char x[4] = {0, 0, 0, 0, };

        for (i = 0; islower(y[i]); i++)
            x[i] = y[i];

        return bar(x);
    }

You may think "well, this is undefined if y is `abcde` because it will
overwrite the memory beyond the array x".  This is *wrong*.  The reason
is: the compiler may notice that if i >= 4, there will be an undefined
behavior.  So the compiler can assume i < 4.  Then the compiler may
decide unrolling the loop to optimize the program:

        if (!islower(y[0])
            goto ret;
        x[0] = y[0];
        if (!islower(y[1])
            goto ret;
        x[1] = y[1];
        if (!islower(y[2])
            goto ret;
        x[2] = y[2];
        if (!islower(y[3])
            goto ret;
        x[3] = y[3];
        ret: return bar(y);

Note that this optimized program will *never* overwrite the memory beyond
x.  So you can't even predict if the overwrite will happen.

If someone believe that "int y[10]; y[-1] = 42" is undefined because
"it would overwrite the memory before y, and mess up the data", what
would he do next?  Well, he may use a linker script to assign the 
memory like:

0xffff8000 - 0xffff8027: int x[10];
0xffff8028 - 0xffff8049: int y[10];

Then he will say "Well, I can use y[-10] to y[9] now because I will
just overwrite array x doing that, no other side effect!"  It's totally
wrong.  If you write something like

    if (a != 42)
        y[-5] = 1;
    else
        printf("%d\n", y[5]);

    printf("%d\n", a);

Then the compiler could say "What? y[-5] = 1 is rediculous because it
invokes undefined behavior.  So I can assume a == 42."  Then the entire
code snip becomes:

    printf("%d\n", y[5]);
    puts("42");

So, y[-5] = 1 is undefined behavior, not because it will mess up the
memory, but because *the standard say so and the compiler can assume
it won't happen*.

My example above is based on [1].  In [1] Steve Summit explained that we
should *not* guess "how will an undefined behavior behave in practice".
Undefined behavior is just unpredictable at all.

In Mahmood's example he is trying to overflow a program's buffer.
It definitely invokes undefined behavior.  So he has to analyze the
(maybe disassembled) object code of the program, *not" its C source code.
We can't predict "what will happen on buffer overflow" by reading C code
because GCC (and other C compilers) actually assumes there is no such
thing.

[1] http://www.eskimo.com/~scs/readings/undef.950321.html
-- 
Xi Ruoyao <ryxi@xxxxxxxxxxxxxxxxx>
School of Aerospace Science and Technology, Xidian University