Re: Strange thing with static Array!

John Fine <johnsfine@xxxxxxxxxxx> · Sat, 28 Feb 2009 16:42:32 -0500

It makes more sense to look at the assemble language generated with some 
optimization turned on.  I used -O3.  The results are shockingly bad, 
but a tiny bit better than what you got.
       call    _ZNSirsERi
       movl    -28(%rbp), %ecx
       movslq  %ecx,%rax
       leaq    30(,%rax,4), %rax
       andq    $-16, %rax
       subq    %rax, %rsp
       leaq    15(%rsp), %r12
       andq    $-16, %r12

The basic task is to convert the value from 32 bit to 64 bit, then 
multiply by four, then round up to a multiple of 16, then subtract that 
from rsp and use it as the address of the array.

1) Converting a signed number from 32-bit to 64 bit is harder than 
unsigned.  The compiler isn't smart enough to realize that if the value 
were negative the result would crash anyway, so the compiler uses the 
harder signed conversion process (movslq or cltq).

2) The salq $2 in your example is the multiply by four.  I'm not sure 
what the sub and add of 1 are for, but certainly not alignment.

3) To round UP to a multiple of 16, you can add 15 then round down to a 
multiple of 16.  Both versions seem to think they must round twice, 
aparently satisfying alignment requirements on both the resulting rsp 
value and the allocated array address.

Actually rounding just once is plenty to align both the stack and the 
allocation.  It also might be faster to round the address down rather 
than round the length up (I'm not sure).

The andq $-16 is the faster way to round down to a multiple of 16.  The 
shrq $4 followed by salq $$ is a slower way.

The leaq 30(,%rax,4) multiplies by 4 and adds 30.  It is nice attention 
to detail for the compiler to merge that together, but rather lame to 
waste another leaq and andq rerounding the rounded result.

Bob Plantz wrote:
On Sat, 2009-02-28 at 12:06 -0500, me22 wrote:

You can see what the compiler is doing for you if you look at the
assembly language. Here is the part where the array gets allocated on
the stack (with my comments added):
	call	_ZNSirsERi         # cin >> array_size
	movl	-12(%rbp), %eax    # load array_size
	cltq                       # convert long to quad
	subq	$1, %rax           # make sure the new stack
	addq	$1, %rax           #   pointer meets all the
	salq	$2, %rax           #   alignment specs.
	addq	$15, %rax
	addq	$15, %rax
	shrq	$4, %rax
	salq	$4, %rax
	subq	%rax, %rsp         # allocate the array
	movq	%rsp, -48(%rbp)    # and save pointer to it

I did this on an x86-64 system in 64-bit mode, and I did not worry
through the alignment code to see exactly what's going on. In
particular,
       subq $1, %rax
       addq $1, %rax
is pretty weird. But the real point is where the array gets allocated on
the stack.

- Bob