need an explanation on assembly generated by various GCC releases

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi list,

(Please Cc: me when replying, I am not subscribed to this list.  Thank.)

For the last couple of day, I've looked at the assembly generated by
various GCC releases.  FYI, I'm working on FreeBSD but according to my
verifications, there doesn't seem to have much difference with Linux.

All compilations have been performed with the -O flag.


I've written the following useless (and vulnerable) program:
% #include <string.h>
% 
% int
% main(int ac, char *av[])
% {
%         char buf[16];
% 
%         if (ac < 2)
%                 return 0;
%         strcpy(buf, av[1]);
%         return 1;
% }


Theorically, the most basic	 The corresponding stack right before
main() function should be	 calling strcpy():
something like this:

%   push %ebp			%	|   av   |
%   mov %esp, %ebp		%	|   ac   |
%   sub $16, %esp		%	|   ret  |
%   cmp $1, 8(%ebp)		% ebp->	|  sebp  | (saved ebp)
%   jle .byebye0		%	|/ / / / | ^
%   mov 12(%ebp), %eax		%	| / / / /| |
%   push 4(%eax)		%	|/ / / / | | (buf, 16 bytes)
%   push -16(%ebp)		%	| / / / /| v
%   call strcpy			%	|  av[1] |
%   mov $1, %eax		% esp->	|  &buf  |
%   jmp .byebye			%
% byebye0:			%
%   mov $0, %eax		%
% byebye:			%
%   leave			%
%   ret				%




With GCC 2.8.1, this is pretty close to what I expected:

% main:   			%	same stack as above
%   pushl %ebp			%
%   movl %esp,%ebp		%
%   subl $16,%esp		%
%   cmpl $1,8(%ebp)		%
%   jle .L2			%
%   movl 12(%ebp),%eax		%
%   pushl 4(%eax)		%
%   leal -16(%ebp),%eax		%
%   pushl %eax			%
%   call strcpy			%
%   movl $1,%eax		%
%   jmp .L3			%
%   .align 4			%
% .L2:				%
%   xorl %eax,%eax		%
% .L3:				%
%   leave			%
%   ret				%



With GCC 2.95.3, this is mostly the same thing, except it allocates
a 24 bytes on the stack instead of a 16.  Note that it stills passes
a 16-bytes buffer to strcpy(); what the purpose of the 8 additional
bytes?

% main:				%	|   av   |
%    pushl %ebp			%	|   ac   |
%    movl %esp,%ebp		%       |   ret  |
%    subl $24,%esp		% ebp-> |  sebp  | (saved ebp)
%    cmpl $1,8(%ebp)		%	|/ / / / | ^ ^
%    jle .L3			%	| / / / /| | |
%    addl $-8,%esp		%	|/ / / / | | | (buf, 16 bytes)
%    movl 12(%ebp),%eax		%	| / / / /| | v
%    pushl 4(%eax)		%	|////////| | ^
%    leal -16(%ebp),%eax	%	|////////| v v (8 bytes, unused)
%    pushl %eax			%	|  av[1] |
%    call strcpy		% esp-> |  &buf  |
%    movl $1,%eax		%
%    jmp .L4			%
%    .p2align 4,,7		%
% .L3:				%
%    xorl %eax,%eax		%
% .L4:				%
%    leave			%
%    ret			%



With GCC 3.4.6, things begin to appear quite weird to me.  First it
allocates 24 bytes on the stack, which will be used for buf.  Contrary
to GCC 2.95.3, a 24-bytes buffer will be passed to strcpy().  What's the
purpose of using 24 bytes instead of 16?

Then %esp is aligned on a 16 bytes boundary and an unused 16-bytes
buffer is allocated before comparing the argument count.  Before setting
up the stack for strcpy(), another unused 8-bytes buffer is allocated!
I can't get the logic behind this.  Any explanation would be welcome.

% main:				%	|   av   |
%    pushl   %ebp		%	|   ac   |
%    movl    %esp, %ebp		%       |   ret  |
%    subl    $24, %esp		% ebp-> |  sebp  | (saved ebp)
%    andl    $-16, %esp		%       |/ / / / | ^
%    subl    $16, %esp		%	| / / / /| |
%    movl    $0, %eax		%	|/ / / / | |
%    cmpl    $1, 8(%ebp)	%	| / / / /| | (buf, 24 bytes)
%    jle     .L1		%	|/ / / / | |
%    subl    $8, %esp		%	| / / / /| v
%    movl    12(%ebp), %eax	%	|\\\\\\\\| ^
%    pushl   4(%eax)		%	|\\\\\\\\| | (stack alignment)
%    leal    -24(%ebp), %eax	%	|\\\\\\\\| v  (variable size)
%    pushl   %eax		%	|/ / / / | ^
%    call    strcpy		%	| / / / /| |
%    movl    $1, %eax		%	|/ / / / | | (16 bytes, unused)
% .L1				%	| / / / /| v
%    leave			%	|\ \ \ \ | ^
%    ret			%	| \ \ \ \| v (8 bytes, unused)
%				%	|  av[1] |
%				% esp-> |  &buf  |



With GCC 4.2.1. things are weird too but differently.  Functionnaly it
is correct but it is very far from what I've expected initially.

In the prolog, before creating a new stack frame, the stack is aligned
on a 16 bytes boundary.  Then `ret' is pushed once more and the new
stack frame is then created.  Afterward the address of `ac'
is pushed and a 36-bytes buffer (9 words) is allocated.  This buffer
will actually contain the 4 words buf in the top of it and the 2
arguments for strcpy() in the bottom.  But there are still 3 unused
words.  Why so?

A additional puzzling behaviour is that av is loaded into %edx before
the argument count comparison while it would be more logical to do it
afterward.  Any idea why GCC does so?

% main:				%	|   av   |
%    leal    4(%esp), %ecx	%	|   ac   | ([1])
%    andl    $-16, %esp		%	|   ret  |
%    pushl   -4(%ecx)		%	|\\\\\\\\| ^
%    pushl   %ebp		%	|\\\\\\\\| | (stack alignment)
%    movl    %esp, %ebp		%	|\\\\\\\\| v  (variable size)
%    pushl   %ecx		%	|   ret  |
%    subl    $36, %esp		% ebp-> |  sebp  |
%    movl    4(%ecx), %edx	%	|  &[1]  |
%    movl    $0, %eax		%	|/ / / / | ^ ^
%    cmpl    $1, (%ecx)		%	| / / / /| | |
%    jle     .L4		%	|/ / / / | | |(buf, 16 bytes) 
%    movl    4(%edx), %eax	%	| / / / /| | v
%    movl    %eax, 4(%esp)	%	|////////| | ^
%    leal    -20(%ebp), %eax	%	|////////| | |
%    movl    %eax, (%esp)	%	|////////| | v
%    call    strcpy		%	|  av[1] | |
%    movl    $1, %eax		% esp-> |  &buf  | v
% .L4:				%
%    addl    $36, %esp		%
%    popl    %ecx		%
%    popl    %ebp		%
%    leal    -4(%ecx), %esp	%
%    ret			%


I'm sorry if these are dumb questions; please just let me know if this
is the case.  Any mailing-list or documentation pointer would be
welcome.

Thank you.
Regards,
-- 
Jeremie Le Hen
< jeremie at le-hen dot org >< ttz at chchile dot org >

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux