Hello, when using -O3 with GCC 4.4.1, identified as gcc (Ubuntu 4.4.1-4ubuntu9) 4.4.1 there seems to be a problem that I haven't been able to reproduce in a minimal example just yet. So I'll try to explain the problem with some pseudo-code. Due to Endianess issues we have a macro GET32 that reads a 32bit value assuming one Endianess for the data to read and guaranteeing the native Endianess of the machine it's running on for the lvalue. There is a PUT32 macro as well guaranteeing the inverse for write operations into a memory location. Now we have various operations that are applied dword-wise to a buffer. ---------------------------- pseudo-code ---------------------------- int filterfunc(..., uint8_t *data, size_t datalen, uint8_t* extra ...) { uint32_t key; const uint8_t *data_max = data + datalen; /* inside a switch statement later on ... */ case FILTER_SET: /* dword-wise "(mem)set" operation */ /* check size to be a multiple of sizeof(uint32_t) etc. */ key = GET32(&extra); while(data < data_max) { PUT32(data, key); data += sizeof(uint32_t); } case FILTER_XOR: /* dword-wise "xor" operation */ /* check size to be a multiple of sizeof(uint32_t) etc. */ key = GET32(&extra); while(data < data_max) { PUT32(data, GET32(data) ^ key); data += sizeof(uint32_t); } /* more similar operations */ } ---------------------------- Now we got crashes in just this function (which is way bigger) in the -O3 setting but not with lower optimization levels. The crashes - I have to add that the only occurred when targeting amd64 - were cause by some misalignment issue. When single-stepping with GDB at opcode level we noticed that GCC had decided to optimize the FILTER_SET operation to use double quadwords. This, per-se, didn't seem the problem. Single-stepping everything looked fine with respect to the filling pattern. The contents of the double quadword register were propagated with the correct values before the crash. However, the pointer wasn't checked for proper alignment but GCC still "assumed" the proper alignment and uses the MOVDQA instruction for the purpose. However, this doesn't hold true in all cases and since GCC doesn't ensure this alignment it probably shouldn't make the assumption. Unlike some of the optimized memcpy() implementations that make sure of the alignment, our code does not by itself ask for copying of 128bit chunks at a time, so one should assume that the optimization has to be still safe enough for such a case. Now what we were wondering is whether this issue ... * is an actual bug? * is caused by a violation of the C standard in our code? * could be solved by using MOVDQU instead? My superior (also a developer) jumped to the conclusion that this is a compiler/optimizer error, but I'm hesitant to follow this judgment. Thanks in advance for any insightful comments, // Oliver