Calling convention weaknesses in 32-bit embedded ARM

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

I work with embedded microcontroller systems - primarily based on 32-bit ARM Cortex-M devices. Efficiency of the generated code is important to me - it means I can use the clearest, safest high-level source code and rely on the tools to do the low-level optimisation.

One thing that sometimes hinders this is the calling conventions set by the CPU vendors. These were often designed in the days when everything was an "int", memory was fast, and 32 bits were enough for anyone, and are not optimal for modern usage.

A general point for efficiency on RISC processors is trying to avoid unnecessary stack usage. Some of the faster Cortex-M cores are now significantly faster than RAM, especially if off-chip RAM is used. Caches and tightly-coupled memories help, but the more you keep in registers, the better. Cortex-M cores are not like modern x86 cores that have store buffers and other features specifically optimising away the overhead of stack usage.

The 32-bit ARM eabi calls for an 8-byte aligned stack. That would have made sense for ancient ARM cores which do not support unaligned accesses and needed it for 64-bit doubles - AFAIK modern ARM cores all handle unaligned access for doubles and vectors without problems. (For devices with hardware double and/or vector support, such data would almost always be in registers or in non-stack data anyway.) 8-byte stack alignment is just a waste of ram and cycles for half of the non-leaf functions in the program.


More importantly, however, is the failure to use registers properly for function returns. The eabi allows R0:R1 to be used for 64-bit integer types and 64-bit doubles (when hardware floating point registers are not available) - other than that, all types greater than 32-bit in size are returned via the stack.

	typedef unsigned long long uint64;
	uint64 big1(void) { return 1; }

	typedef struct Uint64 { uint64 val; } Uint64;
	Uint64 big2(void) { return (Uint64) { 1 }; }

Compiles to:

big1:
        movs    r0, #1
        movs    r1, #0
        bx      lr
big2:
        movs    r2, #1
        movs    r3, #0
        strd    r2, [r0]
        bx      lr

(Code here was from godbolt.org, using ARM GCC 14.2.0 (unknown-eabi) with flags "-O2 -mcpu=cortex-m4".)


Simply wrapping the 64-bit integer type in a struct leads to using the stack for the return value. On some quick measurements I tried on a 600 MHz Cortex-M7 device using tightly-coupled memory for the stack, the "struct" version took /16/ times as long as the R0:R1 return version - 80 cycles extra. Timings like this are influenced by many factors, but the overhead here is not insignificant.

(For comparison, more modern ABI's like RISC-V and x86-64 will return structs in two registers where possible, including mixing integer and floating point registers where it makes sense.)


Small structs turn up regularly in modern coding, especially in newer C++. std::optional<>, std::variant<>, std::expected<> - these are all useful for safe coding, but have a significant unnecessary overhead. The same problem applies to strong type wrappers around 64-bit integers.


I can't see any good reason who all four scratch registers r0-r3 should not be used for return values.


I'm hoping to get some ideas or workarounds for this limitation. Maybe there are appropriate gcc options or function attributes that I haven't noticed. (There is plenty of precedence for different calling convention flags and function attributes in the x86 gcc port.) Failing that, it would be nice to have opinions on whether or not any of this would be a good idea. I don't imagine it would be trivial to implement these two suggestions - there's no point in filing a bugzilla feature request unless other people also think they would be useful.


David




[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux