On 08/18/2015 11:30 AM, Lionel Villard wrote:
Hi,
consider this small C program:
#include <stdlib.h>
int tailoptimized(int c1, int c2, int c3, int c4, int c5, int c6)
{
if (c1 > c5)
{
if (c4 < 5)
{
return c3 + c2;
}
}
else if (c5 > 45)
return c6;
return c3 + c4 + c5 + c6;
}
int nottailoptimized(int c1, int c2, int c3, int c4, int c5, int c6, int
c7)
{
if (c2 > c5)
{
if (c2 < 5)
{
return c3 + c2;
}
}
else if (c1 > 45)
return c6;
return c1 + c4 + c5 + c6;
}
int call(int count)
{
if (count > 100)
{
return tailoptimized(count *2, count, count * 3, count, count,
count);
}
return nottailoptimized(count, count* 2, count, count, count, count,
count * 4);
}
int main (int argc, char* argv[])
{
call(atoi(argv[0]));
}
I used this command to compile it:
gcc -O1 -foptimize-sibling-calls -S -c test.c
and generated assembly code for the 'call' function is:
call:
.LFB17:
.cfi_startproc
cmpl $100, %edi
jle .L12
leal (%rdi,%rdi), %eax
leal (%rax,%rdi), %edx
movl %edi, %r9d
movl %edi, %r8d
movl %edi, %ecx
movl %edi, %esi
movl %eax, %edi
jmp tailoptimized
.L12:
subq $8, %rsp
.cfi_def_cfa_offset 16
leal (%rdi,%rdi), %esi
leal 0(,%rdi,4), %eax
movl %eax, (%rsp)
movl %edi, %r9d
movl %edi, %r8d
movl %edi, %ecx
movl %edi, %edx
call nottailoptimized
addq $8, %rsp
.cfi_def_cfa_offset 8
ret
.cfi_endproc
As you can see, when a function takes more than 6 arguments
(nottailoptimized in this case), TCO is turned off.
What's the reason for this limitation (number of registers?)? Is there a
flag/option that would allow more than 6 arguments?
There's no inherent limit on the number of arguments. It's constrained
more by the need to avoid stack allocation in the caller.
In your example "call" must allocate a stack slot for the last arugment
to "nottailoptimized". The x86_64 ABI mandates that "call" would also
be responsible for deallocating that stack slot. That obviously can't
happen if the call to "nottailoptimized" was turned into a tail call.
"tailoptimized" doesn't suffer from this problem because it does not
need any of its arguments passed in the stack.
In theory, the tail call optimization could be enhanced to create a
clone of "nottailoptimized" which would de-allocate the stack for any
arguments and the tail call optimization could jump to that special
copy. Nobody's implemented that. It's not even clear if it'd be a good
idea or not.
Jeff