Hi. This is a re-post of my previous question, with corrections and test code. I am trying to create a dynamic library for x86_64-apple-darwin12 with gcc 4.8.1. My understanding is that the x86-64 ABI requires 16 byte stack alignment. However, the following program looses 16-byte alignment between function calls. This results in a memory fault when a dynamic loader stub function is invoked on sub2 calling sub3. The same test runs correctly when compiled for static linking. Tracing with GDB shows that the stack misalignment is still present, but there is no fault because there is no call to the loader or other ABI entry. >From the assembly code below, it seems that the problem is at the last instruction of the prolog of sub2. subq $8, %rsp is correct for 8-byte alignment, but invalid for 16. The same compilation on our Linux system emits subq $16, %rsp, which I believe is correct. Is there a straightforward way to preserve 16-byte stack alignment on this target? I tried some of the obvious controls such as -mpreferred-stack-boundary=4, but no relief. There are several kludges that either hide the alignment problem, or add excess code. Thanks for any insights. ---------------------------------- main.c: #include <stdio.h> extern void sub1( void ); int main() { (void) fprintf (stderr, "Main: Call sub1.\n"); sub1(); (void) fprintf (stderr, "Main: sub1 returned.\n"); return 0; } ---------------------------------- subs.c: int sub3( void ) { return 99; } void sub2( int a ) { if (a == 2) sub3(); } void sub1( void ) { sub2( 1 ); sub2( 2 ); } ---------------------------------- Compile commands for Mac: gcc -g -O0 -dynamiclib -fPIC -fno-common -flat_namespace \ -Wall -save-temps subs.c -o subs.dylib gcc -g -Wall main.c subs.dylib There is additional info on "hidden" compile options found in the debug section of the assembly temp file: .ascii "GNU C 4.8.1 -fpreprocessed -feliminate-unused-debug-symbols -mmacosx-version-min=10.8.4 -mtune=core2 -g -O0 -fPIC -fno-common\0" ---------------------------------- Compiler: mac56:~/bugs/gcc/stack-align 241> gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/opt/local/libexec/gcc/x86_64-apple-darwin12/4.8.1/lto-wrapper Target: x86_64-apple-darwin12 Configured with: /opt/local/var/macports/build/_opt_mports_dports_lang_gcc48/gcc48/work/gcc-4.8.1/configure --prefix=/opt/local --build=x86_64-apple-darwin12 --enable-languages=c,c++,objc,obj-c++,lto,fortran,java --libdir=/opt/local/lib/gcc48 --includedir=/opt/local/include/gcc48 --infodir=/opt/local/share/info --mandir=/opt/local/share/man --datarootdir=/opt/local/share/gcc-4.8 --with-local-prefix=/opt/local --with-system-zlib --disable-nls --program-suffix=-mp-4.8 --with-gxx-include-dir=/opt/local/include/gcc48/c++/ --with-gmp=/opt/local --with-mpfr=/opt/local --with-mpc=/opt/local --with-cloog=/opt/local --enable-cloog-backend=isl --disable-cloog-version-check --enable-stage1-checking --disable-multilib --enable-lto --enable-libstdcxx-time --with-as=/opt/local/bin/as --with-ld=/opt/local/bin/ld --with-ar=/opt/local/bin/ar --with-bugurl=https://trac.macports.org/newticket --with-pkgversion='MacPorts gcc48 4.8.1_3' Thread model: posix gcc version 4.8.1 (MacPorts gcc48 4.8.1_3) ---------------------------------- Platform: uname -a: Darwin mac56 12.4.0 Darwin Kernel Version 12.4.0: Wed May 1 17:57:12 PDT 2013; root:xnu-2050.24.15~1/RELEASE_X86_64 x86_64 Mac OS version 10.8.4 Hardware config: Model Name: Mac Pro Model Identifier: MacPro4,1 Processor Name: Quad-Core Intel Xeon Processor Speed: 2.66 GHz Number of Processors: 1 Total Number of Cores: 4 L2 Cache (per Core): 256 KB L3 Cache: 8 MB Memory: 8 GB Processor Interconnect Speed: 4.8 GT/s Boot ROM Version: MP41.0081.B07 SMC Version (system): 1.39f5 SMC Version (processor tray): 1.39f5 Serial Number (system): xxxxxxxxxx Serial Number (processor tray): xxxxxxxxxx Hardware UUID: xxxxxxxxxx ---------------------------------- Assembly code emitted by gcc (subs.s): .text Ltext0: .globl _sub3 _sub3: LFB0: LM1: pushq %rbp LCFI0: movq %rsp, %rbp LCFI1: LM2: movl $99, %eax LM3: popq %rbp LCFI2: ret LFE0: .globl _sub2 _sub2: LFB1: LM4: pushq %rbp LCFI3: movq %rsp, %rbp LCFI4: subq $8, %rsp movl %edi, -4(%rbp) LM5: cmpl $2, -4(%rbp) jne L3 LM6: call _sub3 L3: LM7: leave LCFI5: ret LFE1: .globl _sub1 _sub1: LFB2: LM8: pushq %rbp LCFI6: movq %rsp, %rbp LCFI7: LM9: movl $1, %edi call _sub2 LM10: movl $2, %edi call _sub2 LM11: popq %rbp LCFI8: ret LFE2: .section __DWARF,__debug_frame,regular,debug (Omitted the rest of this info section, here to EOF) ---------------------------------- Normal output: ./a.out Main: Call sub1. Segmentation fault ---------------------------------- Selected trace output from GDB: Full trace available on request. GNU gdb 6.3.50-20050815 (Apple version gdb-1824) (Thu Nov 15 10:42:43 UTC 2012) Call steps from main to sub1, then sub1 to sub2. Note that the dynamic loader is invoked on each of these calls, and the full (very long) trace is hidden by GDB in "step" mode. Note that 16-byte stack alignment is good within sub1, broken within sub2. (gdb) step Main: Call sub1. 6 sub1(); 3: $rsp = (void *) 0x7fff5fbfe730 2: $rbp = (void *) 0x7fff5fbfe730 1: x/i $pc 0x100000f29 <main+39>: callq 0x100000f58 <dyld_stub_sub1> (gdb) sub1 () at subs.c:14 14 sub2( 1 ); 3: $rsp = (void *) 0x7fff5fbfe720 2: $rbp = (void *) 0x7fff5fbfe720 1: x/i $pc 0x100003efa <sub1+4>: mov $0x1,%edi (gdb) sub2 (a=1) at subs.c:8 8 if (a == 2) 3: $rsp = (void *) 0x7fff5fbfe708 2: $rbp = (void *) 0x7fff5fbfe710 1: x/i $pc 0x100003ee9 <sub2+11>: cmpl $0x2,-0x4(%rbp) (gdb) Single step the SECOND call from sub1 to sub2. This is to avoid single stepping through the tedious loader process, which was resolved on the first call. (gdb) sub1 () at subs.c:15 15 sub2( 2 ); 3: $rsp = (void *) 0x7fff5fbfe720 2: $rbp = (void *) 0x7fff5fbfe720 1: x/i $pc 0x100003f04 <sub1+14>: mov $0x2,%edi (gdb) stepi 0x0000000100003f09 15 sub2( 2 ); 3: $rsp = (void *) 0x7fff5fbfe720 2: $rbp = (void *) 0x7fff5fbfe720 1: x/i $pc 0x100003f09 <sub1+19>: callq 0x100003f10 <dyld_stub_sub2> (gdb) 0x0000000100003f10 in dyld_stub_sub2 () 3: $rsp = (void *) 0x7fff5fbfe718 2: $rbp = (void *) 0x7fff5fbfe720 1: x/i $pc 0x100003f10 <dyld_stub_sub2>: jmpq *0xfa(%rip) # 0x100004010 (gdb) sub2 (a=1) at subs.c:7 7 { 3: $rsp = (void *) 0x7fff5fbfe718 2: $rbp = (void *) 0x7fff5fbfe720 1: x/i $pc 0x100003ede <sub2>: push %rbp (gdb) 0x0000000100003edf 7 { 3: $rsp = (void *) 0x7fff5fbfe710 2: $rbp = (void *) 0x7fff5fbfe720 1: x/i $pc 0x100003edf <sub2+1>: mov %rsp,%rbp (gdb) 0x0000000100003ee2 7 { 3: $rsp = (void *) 0x7fff5fbfe710 2: $rbp = (void *) 0x7fff5fbfe710 1: x/i $pc 0x100003ee2 <sub2+4>: sub $0x8,%rsp (gdb) 0x0000000100003ee6 7 { 3: $rsp = (void *) 0x7fff5fbfe708 2: $rbp = (void *) 0x7fff5fbfe710 1: x/i $pc 0x100003ee6 <sub2+8>: mov %edi,-0x4(%rbp) (gdb) 8 if (a == 2) 3: $rsp = (void *) 0x7fff5fbfe708 2: $rbp = (void *) 0x7fff5fbfe710 1: x/i $pc 0x100003ee9 <sub2+11>: cmpl $0x2,-0x4(%rbp) (gdb) 0x0000000100003eed 8 if (a == 2) 3: $rsp = (void *) 0x7fff5fbfe708 2: $rbp = (void *) 0x7fff5fbfe710 1: x/i $pc 0x100003eed <sub2+15>: jne 0x100003ef4 <sub2+22> (gdb) 9 sub3(); 3: $rsp = (void *) 0x7fff5fbfe708 2: $rbp = (void *) 0x7fff5fbfe710 1: x/i $pc 0x100003eef <sub2+17>: callq 0x100003f16 <dyld_stub_sub3> (gdb) 0x0000000100003f16 in dyld_stub_sub3 () 3: $rsp = (void *) 0x7fff5fbfe700 2: $rbp = (void *) 0x7fff5fbfe710 1: x/i $pc 0x100003f16 <dyld_stub_sub3>: jmpq *0xfc(%rip) # 0x100004018 (gdb) Now we are going deeper into the dynamic loader stub function for sub3. This is ABI territory, I think. The ABI required alignment was already violated at callq 0x100003f16 just above. A dozen or so instructions later, this is the conclusion: (gdb) 0x00007fff85ac68a0 in dyld_stub_binder () 3: $rsp = (void *) 0x7fff5fbfe628 2: $rbp = (void *) 0x7fff5fbfe6e8 1: x/i $pc 0x7fff85ac68a0 <dyld_stub_binder+40>: mov %rax,0x30(%rsp) (gdb) 0x00007fff85ac68a5 in misaligned_stack_error_entering_dyld_stub_binder () 3: $rsp = (void *) 0x7fff5fbfe628 2: $rbp = (void *) 0x7fff5fbfe6e8 1: x/i $pc 0x7fff85ac68a5 <misaligned_stack_error_entering_dyld_stub_binder>: movdqa %xmm0,0x40(%rsp) (gdb) Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: 13 at address: 0x0000000000000000 --Dave