Hi, It seems that GCC configured for MIPS will always generate a call to memcpy when optimizing for size, is this expected behaviour? I have encountered this both with a self-built GCC 4.8.1 (configured for mipsel-sde-elf) and Microchip's XC32 compiler, which is based on GCC 4.5.2. Here's what I get when building the following code with GCC 4.8.1 and -march=m4k: #include <stdint.h> uint32_t foo(const uint8_t *pA) { uint32_t sum = 0; uint32_t tmp, ii; for (ii = 0; ii < 256; ii++) { __builtin_memcpy(&tmp, &pA[ii*sizeof(tmp)], sizeof(tmp)); sum += tmp; } return sum; } With -O1, the following is generated: 00000000 <foo>: 0: 27bdfff8 addiu sp,sp,-8 4: 00002821 move a1,zero 8: 00001021 move v0,zero c: 24070400 li a3,1024 10: 00851821 addu v1,a0,a1 14: 88660003 lwl a2,3(v1) 18: 98660000 lwr a2,0(v1) 1c: afa60000 sw a2,0(sp) 20: 24a50004 addiu a1,a1,4 24: 14a7fffa bne a1,a3,10 <foo+0x10> 28: 00461021 addu v0,v0,a2 2c: 03e00008 jr ra 30: 27bd0008 addiu sp,sp,8 But with -Os, I get this: 00000000 <foo>: 0: 27bdffd0 addiu sp,sp,-48 4: afb30028 sw s3,40(sp) 8: afb20024 sw s2,36(sp) c: afb10020 sw s1,32(sp) 10: afb0001c sw s0,28(sp) 14: afbf002c sw ra,44(sp) 18: 00809821 move s3,a0 1c: 00008021 move s0,zero 20: 00008821 move s1,zero 24: 24120400 li s2,1024 28: 02702821 addu a1,s3,s0 2c: 27a40010 addiu a0,sp,16 30: 0c000000 jal 0 <foo> 34: 24060004 li a2,4 38: 8fa20010 lw v0,16(sp) 3c: 26100004 addiu s0,s0,4 40: 1612fff9 bne s0,s2,28 <foo+0x28> 44: 02228821 addu s1,s1,v0 48: 8fbf002c lw ra,44(sp) 4c: 02201021 move v0,s1 50: 8fb30028 lw s3,40(sp) 54: 8fb20024 lw s2,36(sp) 58: 8fb10020 lw s1,32(sp) 5c: 8fb0001c lw s0,28(sp) 60: 03e00008 jr ra 64: 27bd0030 addiu sp,sp,48 -O2 produces identical code to -O1, modulo allocated registers and scheduling. As a sidenote, the store of tmp to the stack is unnecessary and could be optimized away. Regards, Anders Montonen (I am not subscribed to the list, so please cc me)