Lawrence Crowl skrev:
On 3/16/07, Erik <sigra@xxxxxxx> wrote:
Ian Lance Taylor skrev:
> Erik <sigra@xxxxxxx> writes:
>
>
>> From man:puts I see that it is declared "int puts(const char
>> *)". This means that puts does not promise to leave its argument
>> unchanged. Therefore the caller must push the argument anew before
>> each call. If it had been declared "int puts(const char * const)"
>> instead, the push should be moved outside the loop. Unfortunately
this
>> does not seem to work. I tried with the following program:
>> void q(const unsigned int);
>> void f() {for (unsigned int x = 0; x != 10; x++) q(77);}
>>
>> and built it with "gcc -std=c99 -Os -Wall -Wextra -Werror -S":
>> .L2:
>> subl $12, %esp
>> incl %ebx
>> pushl $77
>> call q
>> addl $16, %esp
>> cmpl $10, %ebx
>> jne .L2
>>
>> As you can see, "pushl $77" is still inside the loop even though q
>> promises to not change its argument. This must be a bug.
>
> This is not a bug. const on an automatic variable in C is more
> advisory than anything else. You are not permitted to change a const
> object, but you can cast its address to a non-const pointer.
The "lost optimization" in this case has nothing to do with const
versus non-const. The issue is that the call deallocates the
parameters. The pushl is allocating the argument.
OK, as I understand it, the pushl $77 does 2 things:
1. Copy the value 77.
2. Allocate space for it (by changing the stack pointer).
And in C, allocating the space must be done in each iteration because
the called function deallocates it. But copying the value 77 in each
iteration could be optimized away.
But what about Ada? Does it have the same calling convetion? See the Ada
loop:
.L5:
pushl $77
.LCFI3:
call _ada_q
popl %eax
decl %ebx
jns .L5
It looks like in Ada, the caller deallocates the parameter (popl). If
this is so, it would mean that both the push and the pop should be moved
out of the loop, something like:
pushl $77
.L5:
.LCFI3:
call _ada_q
decl %ebx
jns .L5
popl %eax
This would cut the loop down to only 3 instructions. Not bad compared to
the 7 instructions in the original C loop. But the C version should
probably be counted as having 8 instructions to make the comparison
accurate, since it has a hidden popl inside q, which is not in Q of the
Ada version. Is this correct?