Thank you, for the answer and the listserv cluebat. Take care, Barrie On Mon, Aug 7, 2023 at 6:45 AM Richard Earnshaw (lists) < Richard.Earnshaw@xxxxxxx> wrote: > This should be on gcc-help@xxxxxxxxxxx, not the main gcc@ list. I've > sent my response there (and hopefully BCC gcc@). > > > On 06/08/2023 01:30, Barrie Slaymaker via Gcc wrote: > > Hi, > > > > I'm cross compiling for 32 bit bare metal ARMs (modern ones: Cortex-M4 > and > > Cortex M-33) w/ gcc 12.3.0, which is the latest available from ARM, (see > > gcc -v output below) and have found that va_arg(..., double) (i.e. > > __builtin_va_arg()) assumes that doubles are 64-bit aligned, but the > stack > > is not always so. > > > > I searched the bug database but didn't see this, so I'm guessing this > isn't > > a GCC bug--the ARM world would be on fire if it were. And I've searched > the > > gcc command line options docs, and the ARM architecture docs to no avail. > > I'm hoping I didn't miss something obvious... > > > > So, does gcc assume or require that doubles on the stack be 64-bit > aligned, > > or is there an option we should be passing to either allow 32-bit > alignment > > or force 64-bit alignment, or is the MCU vendor's startup code a wee > buggy > > (this is what I suspect, but wanted to be damn sure before continuing)? > > > > Your problem is a common one. GCC maintains 64-bit stack alignment in > code, but it does not align the stack if the caller messes up. Your > most likely problem is that the stack was not correctly aligned before > calling main(). This is something the startup code must ensure when > setting up the program environment. > > R. > > > Here's the test code: > > > > void va_args_test(int i, ...) { > > va_list args; > > va_start(args, i); > > double d = (int)va_arg(args, double); > > va_end(args); > > // display code elided > > } > > > > Here's the generated assembly, with commentary mine: > > > > void va_args_test(int i, ...) { > > 3f60:→ b40f → push→ {r0, r1, r2, r3} > > 3f62:→ b580 → push→ {r7, lr} > > 3f64:→ b082 → sub→sp, #8 > > 3f66:→ af00 → add→r7, sp, #0 > > > > va_list args; > > 3f68:→ 2300 → movs→ r3, #0 > > 3f6a:→ 607b → str→r3, [r7, #4] > > > > va_start(args, i); > > 3f6c:→ f107 0314 → add.w→ r3, r7, #20 > > 3f70:→ 607b → str→r3, [r7, #4] > > > > double d = (int)va_arg(args, double); > > 3f72:→ f107 031b → add.w→ r3, r7, #27 ; Loads the address of the > > last byte of the low order word into r3. > > 3f76:→ f023 0307 → bic.w→ r3, r3, #7 ; Clears the low 3 bits, > > which works when the double is 64-bit aligned. Not so much otherwise. > > 3f7a:→ f103 0208 → add.w→ r2, r3, #8 ; Increments args' > internal > > pointer > > 3f7e:→ 607a → str→r2, [r7, #4] ; Saves that pointer > > 3f80:→ e9d3 0100 → ldrd→ r0, r1, [r3] ; Reads the double, right > or > > wrong... > > > > Here's the call site assembly: > > > > va_args_test(0, (double)1.0); > > 3fc2:→ 2200 → movs→ r2, #0 > > 3fc4:→ 4b09 → ldr→r3, [pc, #36]→ ; (3fec <main+0x44>) > > 3fc6:→ 2000 → movs→ r0, #0 > > 3fc8:→ 4909 → ldr→r1, [pc, #36]→ ; (3ff0 <main+0x48>) > > 3fca:→ 4788 → blx→r1 > > > > This is using GCC 12.3.0, cross-compiling for ARM on x86_64 (gcc -v > output > > below sig), with a command line like > > > > arm-none-eabi-gcc -o ../build/main/PAC5524/tmp/base/src/main.o > > base/src/main.c <<-I options elided>>> -mcpu=cortex-m4 -march=armv7e-m > > -mfpu=fpv4-sp-d16 -std=gnu99 -ffunction-sections -fno-omit-frame-pointer > > -fno-strict-overflow -fsingle-precision-constant > > -ftrivial-auto-var-init=zero -mthumb -mlittle-endian -mlong-calls > > -mfloat-abi=hard -Og -c -MD -MP > > > > Removing any one of the -f options happens to align the stack correctly > in > > most cases (I've elided the -f options that don't affect this issue as > far > > as I can tell). > > > > Many thanks, > > > > Barrie > > > > gcc -v output: > > > > Using built-in specs. > > COLLECT_GCC=arm-none-eabi-gcc > > > COLLECT_LTO_WRAPPER=/usr/share/arm-gnu-toolchain-12.3.rel1-x86_64-arm-none-eabi/bin/../libexec/gcc/arm-none-eabi/12.3.1/lto-wrapper > > Target: arm-none-eabi > > Configured with: > > /data/jenkins/workspace/GNU-toolchain/arm-12/src/gcc/configure > > --target=arm-none-eabi > > > --prefix=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install > > > --with-gmp=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools > > --with-mpfr=/data/jenkins/workspace/GNU-toolchai > > n/arm-12/build-arm-none-eabi/host-tools > > > --with-mpc=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools > > > --with-isl=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/host-tools > > --disable-shared --disable-nls --disable-threads --disable-tls > > --enable-checking=release --enable-language > > s=c,c++,fortran --with-newlib --with-gnu-as --with-gnu-ld > > > --with-sysroot=/data/jenkins/workspace/GNU-toolchain/arm-12/build-arm-none-eabi/install/arm-none-eabi > > --with-multilib-list=aprofile,rmprofile --with-pkgversion='Arm GNU > > Toolchain 12.3.Rel1 (Build arm-12.35)' --with-bugurl= > > https://bugs.linaro.org/ > > Thread model: single > > Supported LTO compression algorithms: zlib > > gcc version 12.3.1 20230626 (Arm GNU Toolchain 12.3.Rel1 (Build > arm-12.35)) > > > > Test code (the LED lights very prettily when va_arg() returns the correct > > value): > > > > void va_args_test(int i, ...) { > > va_list args; > > va_start(args, i); > > i = (int)va_arg(args, double); > > va_end(args); > > bal_init(); > > bal_set_AUX_LED1(i == 1); > > } > > > > int main(void) { > > ...CPU initialization elided... > > va_args_test(0, (double)1.0); > > while (true) { > > } > > } > >