Re: Optimizing 32 bits integer manipulation on 8 bit AVR target

Sylvain Leroux <sylvain@xxxxxxxxxxx> · Thu, 30 Aug 2012 15:05:11 +0200

As a complement to my previous message,

It appears the following C source leads to much better code:

---------8<-----------------------------------
void f(uint32_t i) {
    union __attribute__((__packed__)) {
	uint32_t i;
	struct S { uint8_t a,b,c,d; } s;
    } u;

    u.i = i | ((uint32_t)(0xFF) << 16);

    DDRA = (uint32_t)(u.s.d);
    DDRA = (uint32_t)(u.s.c);
    DDRA = (uint32_t)(u.s.b);
    DDRA = (uint32_t)(u.s.a);
}
---------8<-----------------------------------

avr-objdump:
---------8<-----------------------------------
    union __attribute__((__packed__)) {
        uint32_t i;
        struct S { uint8_t a,b,c,d; } s;
    } u;

    u.i = i | ((uint32_t)(0xFF) << 16);
  58:   af 6f           ori     r26, 0xFF       ; 255

    DDRA = (uint32_t)(u.s.d);
  5a:   ba bb           out     0x1a, r27       ; 26
    DDRA = (uint32_t)(u.s.c);
  5c:   aa bb           out     0x1a, r26       ; 26
    DDRA = (uint32_t)(u.s.b);
  5e:   9a bb           out     0x1a, r25       ; 26
    DDRA = (uint32_t)(u.s.a);
  60:   8a bb           out     0x1a, r24       ; 26
---------8<-----------------------------------

*But*, the C code is no longer portable since I'm using 
"__attribute__((__packed__))". Moreover it requires endianness 
knowledge/assumption.

That's why I was hoping for a command line option allowing gcc to 
perform the same optimization.

- Sylvain

On 08/30/2012 02:54 PM, Sylvain Leroux wrote:
Hi,

It seems to me that avr-gcc/avr-g++ is producing sub-optimal code for
the 'f' function in the following source code:

---------8<-----------------------------------
#include <avr/io.h>

void f(uint32_t i) {
i |= ((uint32_t)(0xFF) << 16);

/* DDRA is an 8 bit register */
DDRA = (uint32_t)(i);
DDRA = (uint32_t)(i>>8);
DDRA = (uint32_t)(i>>16);
DDRA = (uint32_t)(i>>24);
}

int main() {
volatile uint32_t n = 0x01020304;

f(n);
}
---------8<-----------------------------------
Having compiled with the following options:
avr-gcc c.c -mmcu=attiny2313
-Os -ffunction-sections -fdata-sections
-g -Wl,--gc-sections -Wl,--print-gc-sections
-fipa-cp -fcprop-registers -fweb

... here is the relevant fragment as displayed by avr-objdump. I marked
with a star (*) all the instruction that appears to be useless:
---------8<-----------------------------------
void f(uint32_t i) {
i |= ((uint32_t)(0xFF) << 16);
34: 8f 6f ori r24, 0xFF ; 255

DDRA = (uint32_t)(i);
36: 6a bb out 0x1a, r22 ; 26
DDRA = (uint32_t)(i>>8);
38: 27 2f mov r18, r23
* 3a: 38 2f mov r19, r24
* 3c: 49 2f mov r20, r25
* 3e: 55 27 eor r21, r21
40: 2a bb out 0x1a, r18 ; 26
DDRA = (uint32_t)(i>>16);
42: 9c 01 movw r18, r24
* 44: 44 27 eor r20, r20
* 46: 55 27 eor r21, r21
48: 2a bb out 0x1a, r18 ; 26
DDRA = (uint32_t)(i>>24);
4a: 69 2f mov r22, r25
* 4c: 77 27 eor r23, r23
* 4e: 88 27 eor r24, r24
* 50: 99 27 eor r25, r25
52: 6a bb out 0x1a, r22 ; 26
}
54: 08 95 ret
---------8<-----------------------------------

Both gcc and g++ produce the same code. And I get the same results both
with 4.3.5 and 4.7.1

Here is my question:
Is there any option(s) that will help gcc to not produce those extra
instructions in such case?

Regards,
- Sylvain

--
-- Sylvain Leroux
-- sylvain@xxxxxxxxxxx
-- http://www.chicoree.fr