As a complement to my previous message,
It appears the following C source leads to much better code:
---------8<-----------------------------------
void f(uint32_t i) {
union __attribute__((__packed__)) {
uint32_t i;
struct S { uint8_t a,b,c,d; } s;
} u;
u.i = i | ((uint32_t)(0xFF) << 16);
DDRA = (uint32_t)(u.s.d);
DDRA = (uint32_t)(u.s.c);
DDRA = (uint32_t)(u.s.b);
DDRA = (uint32_t)(u.s.a);
}
---------8<-----------------------------------
avr-objdump:
---------8<-----------------------------------
union __attribute__((__packed__)) {
uint32_t i;
struct S { uint8_t a,b,c,d; } s;
} u;
u.i = i | ((uint32_t)(0xFF) << 16);
58: af 6f ori r26, 0xFF ; 255
DDRA = (uint32_t)(u.s.d);
5a: ba bb out 0x1a, r27 ; 26
DDRA = (uint32_t)(u.s.c);
5c: aa bb out 0x1a, r26 ; 26
DDRA = (uint32_t)(u.s.b);
5e: 9a bb out 0x1a, r25 ; 26
DDRA = (uint32_t)(u.s.a);
60: 8a bb out 0x1a, r24 ; 26
---------8<-----------------------------------
*But*, the C code is no longer portable since I'm using
"__attribute__((__packed__))". Moreover it requires endianness
knowledge/assumption.
That's why I was hoping for a command line option allowing gcc to
perform the same optimization.
- Sylvain
On 08/30/2012 02:54 PM, Sylvain Leroux wrote:
Hi,
It seems to me that avr-gcc/avr-g++ is producing sub-optimal code for
the 'f' function in the following source code:
---------8<-----------------------------------
#include <avr/io.h>
void f(uint32_t i) {
i |= ((uint32_t)(0xFF) << 16);
/* DDRA is an 8 bit register */
DDRA = (uint32_t)(i);
DDRA = (uint32_t)(i>>8);
DDRA = (uint32_t)(i>>16);
DDRA = (uint32_t)(i>>24);
}
int main() {
volatile uint32_t n = 0x01020304;
f(n);
}
---------8<-----------------------------------
Having compiled with the following options:
avr-gcc c.c -mmcu=attiny2313
-Os -ffunction-sections -fdata-sections
-g -Wl,--gc-sections -Wl,--print-gc-sections
-fipa-cp -fcprop-registers -fweb
... here is the relevant fragment as displayed by avr-objdump. I marked
with a star (*) all the instruction that appears to be useless:
---------8<-----------------------------------
void f(uint32_t i) {
i |= ((uint32_t)(0xFF) << 16);
34: 8f 6f ori r24, 0xFF ; 255
DDRA = (uint32_t)(i);
36: 6a bb out 0x1a, r22 ; 26
DDRA = (uint32_t)(i>>8);
38: 27 2f mov r18, r23
* 3a: 38 2f mov r19, r24
* 3c: 49 2f mov r20, r25
* 3e: 55 27 eor r21, r21
40: 2a bb out 0x1a, r18 ; 26
DDRA = (uint32_t)(i>>16);
42: 9c 01 movw r18, r24
* 44: 44 27 eor r20, r20
* 46: 55 27 eor r21, r21
48: 2a bb out 0x1a, r18 ; 26
DDRA = (uint32_t)(i>>24);
4a: 69 2f mov r22, r25
* 4c: 77 27 eor r23, r23
* 4e: 88 27 eor r24, r24
* 50: 99 27 eor r25, r25
52: 6a bb out 0x1a, r22 ; 26
}
54: 08 95 ret
---------8<-----------------------------------
Both gcc and g++ produce the same code. And I get the same results both
with 4.3.5 and 4.7.1
Here is my question:
Is there any option(s) that will help gcc to not produce those extra
instructions in such case?
Regards,
- Sylvain
--
-- Sylvain Leroux
-- sylvain@xxxxxxxxxxx
-- http://www.chicoree.fr