Hi list, (Please Cc: me when replying, I am not subscribed to this list. Thank.) For the last couple of day, I've looked at the assembly generated by various GCC releases. FYI, I'm working on FreeBSD but according to my verifications, there doesn't seem to have much difference with Linux. All compilations have been performed with the -O flag. I've written the following useless (and vulnerable) program: % #include <string.h> % % int % main(int ac, char *av[]) % { % char buf[16]; % % if (ac < 2) % return 0; % strcpy(buf, av[1]); % return 1; % } Theorically, the most basic The corresponding stack right before main() function should be calling strcpy(): something like this: % push %ebp % | av | % mov %esp, %ebp % | ac | % sub $16, %esp % | ret | % cmp $1, 8(%ebp) % ebp-> | sebp | (saved ebp) % jle .byebye0 % |/ / / / | ^ % mov 12(%ebp), %eax % | / / / /| | % push 4(%eax) % |/ / / / | | (buf, 16 bytes) % push -16(%ebp) % | / / / /| v % call strcpy % | av[1] | % mov $1, %eax % esp-> | &buf | % jmp .byebye % % byebye0: % % mov $0, %eax % % byebye: % % leave % % ret % With GCC 2.8.1, this is pretty close to what I expected: % main: % same stack as above % pushl %ebp % % movl %esp,%ebp % % subl $16,%esp % % cmpl $1,8(%ebp) % % jle .L2 % % movl 12(%ebp),%eax % % pushl 4(%eax) % % leal -16(%ebp),%eax % % pushl %eax % % call strcpy % % movl $1,%eax % % jmp .L3 % % .align 4 % % .L2: % % xorl %eax,%eax % % .L3: % % leave % % ret % With GCC 2.95.3, this is mostly the same thing, except it allocates a 24 bytes on the stack instead of a 16. Note that it stills passes a 16-bytes buffer to strcpy(); what the purpose of the 8 additional bytes? % main: % | av | % pushl %ebp % | ac | % movl %esp,%ebp % | ret | % subl $24,%esp % ebp-> | sebp | (saved ebp) % cmpl $1,8(%ebp) % |/ / / / | ^ ^ % jle .L3 % | / / / /| | | % addl $-8,%esp % |/ / / / | | | (buf, 16 bytes) % movl 12(%ebp),%eax % | / / / /| | v % pushl 4(%eax) % |////////| | ^ % leal -16(%ebp),%eax % |////////| v v (8 bytes, unused) % pushl %eax % | av[1] | % call strcpy % esp-> | &buf | % movl $1,%eax % % jmp .L4 % % .p2align 4,,7 % % .L3: % % xorl %eax,%eax % % .L4: % % leave % % ret % With GCC 3.4.6, things begin to appear quite weird to me. First it allocates 24 bytes on the stack, which will be used for buf. Contrary to GCC 2.95.3, a 24-bytes buffer will be passed to strcpy(). What's the purpose of using 24 bytes instead of 16? Then %esp is aligned on a 16 bytes boundary and an unused 16-bytes buffer is allocated before comparing the argument count. Before setting up the stack for strcpy(), another unused 8-bytes buffer is allocated! I can't get the logic behind this. Any explanation would be welcome. % main: % | av | % pushl %ebp % | ac | % movl %esp, %ebp % | ret | % subl $24, %esp % ebp-> | sebp | (saved ebp) % andl $-16, %esp % |/ / / / | ^ % subl $16, %esp % | / / / /| | % movl $0, %eax % |/ / / / | | % cmpl $1, 8(%ebp) % | / / / /| | (buf, 24 bytes) % jle .L1 % |/ / / / | | % subl $8, %esp % | / / / /| v % movl 12(%ebp), %eax % |\\\\\\\\| ^ % pushl 4(%eax) % |\\\\\\\\| | (stack alignment) % leal -24(%ebp), %eax % |\\\\\\\\| v (variable size) % pushl %eax % |/ / / / | ^ % call strcpy % | / / / /| | % movl $1, %eax % |/ / / / | | (16 bytes, unused) % .L1 % | / / / /| v % leave % |\ \ \ \ | ^ % ret % | \ \ \ \| v (8 bytes, unused) % % | av[1] | % % esp-> | &buf | With GCC 4.2.1. things are weird too but differently. Functionnaly it is correct but it is very far from what I've expected initially. In the prolog, before creating a new stack frame, the stack is aligned on a 16 bytes boundary. Then `ret' is pushed once more and the new stack frame is then created. Afterward the address of `ac' is pushed and a 36-bytes buffer (9 words) is allocated. This buffer will actually contain the 4 words buf in the top of it and the 2 arguments for strcpy() in the bottom. But there are still 3 unused words. Why so? A additional puzzling behaviour is that av is loaded into %edx before the argument count comparison while it would be more logical to do it afterward. Any idea why GCC does so? % main: % | av | % leal 4(%esp), %ecx % | ac | ([1]) % andl $-16, %esp % | ret | % pushl -4(%ecx) % |\\\\\\\\| ^ % pushl %ebp % |\\\\\\\\| | (stack alignment) % movl %esp, %ebp % |\\\\\\\\| v (variable size) % pushl %ecx % | ret | % subl $36, %esp % ebp-> | sebp | % movl 4(%ecx), %edx % | &[1] | % movl $0, %eax % |/ / / / | ^ ^ % cmpl $1, (%ecx) % | / / / /| | | % jle .L4 % |/ / / / | | |(buf, 16 bytes) % movl 4(%edx), %eax % | / / / /| | v % movl %eax, 4(%esp) % |////////| | ^ % leal -20(%ebp), %eax % |////////| | | % movl %eax, (%esp) % |////////| | v % call strcpy % | av[1] | | % movl $1, %eax % esp-> | &buf | v % .L4: % % addl $36, %esp % % popl %ecx % % popl %ebp % % leal -4(%ecx), %esp % % ret % I'm sorry if these are dumb questions; please just let me know if this is the case. Any mailing-list or documentation pointer would be welcome. Thank you. Regards, -- Jeremie Le Hen < jeremie at le-hen dot org >< ttz at chchile dot org >