On 2/3/10 8:01 PM, Shaun Spiller wrote:
I've noticed in the disassemblies of code that GCC generates (via the
-save-temps option) that it's trying to avoid pushing arguments onto
the stack when calling functions. Instead, it makes some space on the
stack in the calling function, and movs arguments directly on to it
(even with -Os).
Then, for a different reason, I added -mno-stack-arg-probe to the
command line arguments, and noticed that the number of lines of *.s
output generated in my project had gone up by about 3%, while the size
of the resulting binary had gone down by about 7%. When I looked at
the disassemblies again to try to find out why, I find it's suddenly
passing arguments via push.
Example function follows.
Without -mno-stack-arg-probe:
1 push ebp
2 mov ebp, esp
3 push esi
4 push ebx
5 sub esp, 16
6 mov esi, DWORD PTR __ZL5width
7 sal esi
8 mov ebx, DWORD PTR __ZL16framebuffer_size
9 sub ebx, esi
10 mov DWORD PTR [esp+8], ebx
11 lea ecx, [esi+753664]
12 mov DWORD PTR [esp+4], ecx
13 mov DWORD PTR [esp], 753664
14 call __Z7memmovePvPKvj
15 mov DWORD PTR [esp+8], esi
16 movzx edx, BYTE PTR __ZL9backcolor
17 sal edx, 4
18 or dl, BYTE PTR __ZL9forecolor
19 movzx eax, dl
20 sal eax, 8
21 mov DWORD PTR [esp+4], eax
22 add ebx, 753664
23 mov DWORD PTR [esp], ebx
24 call __Z9memset_16Pvij
25 add esp, 16
26 pop ebx
27 pop esi
28 leave
29 ret
With -mno-stack-arg-probe:
1 push ebp
2 mov ebp, esp
3 push esi
4 push ebx
5 mov esi, DWORD PTR __ZL5width
6 sal esi
7 mov ebx, DWORD PTR __ZL16framebuffer_size
8 sub ebx, esi
9 push ecx
10 push ebx
11 lea ecx, [esi+753664]
12 push ecx
13 push 753664
14 call __Z7memmovePvPKvj
15 add esp, 12
16 push esi
17 movzx edx, BYTE PTR __ZL9backcolor
18 sal edx, 4
19 or dl, BYTE PTR __ZL9forecolor
20 movzx eax, dl
21 sal eax, 8
22 push eax
23 add ebx, 753664
24 push ebx
25 call __Z9memset_16Pvij
26 add esp, 16
27 lea esp, [ebp-8]
28 pop ebx
29 pop esi
30 leave
31 ret
This behavior seems extremely peculiar. Can anyone shed some light?
Does this relate to a specific version of gcc, to a specific set of
options, and to functions eligible for inline-functions?
The Intel Nocona architecture was particularly grateful for
optimizations reducing the amount of data pushed to stack.
Tim Prince