It makes more sense to look at the assemble language generated with some
optimization turned on. I used -O3. The results are shockingly bad,
but a tiny bit better than what you got.
call _ZNSirsERi
movl -28(%rbp), %ecx
movslq %ecx,%rax
leaq 30(,%rax,4), %rax
andq $-16, %rax
subq %rax, %rsp
leaq 15(%rsp), %r12
andq $-16, %r12
The basic task is to convert the value from 32 bit to 64 bit, then
multiply by four, then round up to a multiple of 16, then subtract that
from rsp and use it as the address of the array.
1) Converting a signed number from 32-bit to 64 bit is harder than
unsigned. The compiler isn't smart enough to realize that if the value
were negative the result would crash anyway, so the compiler uses the
harder signed conversion process (movslq or cltq).
2) The salq $2 in your example is the multiply by four. I'm not sure
what the sub and add of 1 are for, but certainly not alignment.
3) To round UP to a multiple of 16, you can add 15 then round down to a
multiple of 16. Both versions seem to think they must round twice,
aparently satisfying alignment requirements on both the resulting rsp
value and the allocated array address.
Actually rounding just once is plenty to align both the stack and the
allocation. It also might be faster to round the address down rather
than round the length up (I'm not sure).
The andq $-16 is the faster way to round down to a multiple of 16. The
shrq $4 followed by salq $$ is a slower way.
The leaq 30(,%rax,4) multiplies by 4 and adds 30. It is nice attention
to detail for the compiler to merge that together, but rather lame to
waste another leaq and andq rerounding the rounded result.
Bob Plantz wrote:
On Sat, 2009-02-28 at 12:06 -0500, me22 wrote:
You can see what the compiler is doing for you if you look at the
assembly language. Here is the part where the array gets allocated on
the stack (with my comments added):
call _ZNSirsERi # cin >> array_size
movl -12(%rbp), %eax # load array_size
cltq # convert long to quad
subq $1, %rax # make sure the new stack
addq $1, %rax # pointer meets all the
salq $2, %rax # alignment specs.
addq $15, %rax
addq $15, %rax
shrq $4, %rax
salq $4, %rax
subq %rax, %rsp # allocate the array
movq %rsp, -48(%rbp) # and save pointer to it
I did this on an x86-64 system in 64-bit mode, and I did not worry
through the alignment code to see exactly what's going on. In
particular,
subq $1, %rax
addq $1, %rax
is pretty weird. But the real point is where the array gets allocated on
the stack.
- Bob