Problem with optimization setting -O3

Oliver Schneider <oliver@xxxxxxxxxx> · Tue, 25 May 2010 19:53:10 +0000

Hello,

when using -O3 with GCC 4.4.1, identified as
  gcc (Ubuntu 4.4.1-4ubuntu9) 4.4.1
there seems to be a problem that I haven't been able to reproduce in a
minimal example just yet.

So I'll try to explain the problem with some pseudo-code. Due to
Endianess issues we have a macro GET32 that reads a 32bit value assuming
one Endianess for the data to read and guaranteeing the native Endianess
of the machine it's running on for the lvalue. There is a PUT32 macro as
well guaranteeing the inverse for write operations into a memory location.

Now we have various operations that are applied dword-wise to a buffer.

----------------------------
 pseudo-code
----------------------------
int filterfunc(..., uint8_t *data, size_t datalen, uint8_t* extra ...)
{
  uint32_t key;
  const uint8_t *data_max = data + datalen;
  /* inside a switch statement later on ... */
  case FILTER_SET: /* dword-wise "(mem)set" operation */
    /* check size to be a multiple of sizeof(uint32_t) etc. */
    key = GET32(&extra);
    while(data < data_max)
    {
      PUT32(data, key);
      data += sizeof(uint32_t);
    }
  case FILTER_XOR: /* dword-wise "xor" operation */
    /* check size to be a multiple of sizeof(uint32_t) etc. */
    key = GET32(&extra);
    while(data < data_max)
    {
      PUT32(data, GET32(data) ^ key);
      data += sizeof(uint32_t);
    }
  /* more similar operations */
}
----------------------------

Now we got crashes in just this function (which is way bigger) in the
-O3 setting but not with lower optimization levels. The crashes - I have
to add that the only occurred when targeting amd64 - were cause by some
misalignment issue.

When single-stepping with GDB at opcode level we noticed that GCC had
decided to optimize the FILTER_SET operation to use double quadwords.

This, per-se, didn't seem the problem. Single-stepping everything looked
fine with respect to the filling pattern. The contents of the double
quadword register were propagated with the correct values before the
crash. However, the pointer wasn't checked for proper alignment but GCC
still "assumed" the proper alignment and uses the MOVDQA instruction for
the purpose. However, this doesn't hold true in all cases and since GCC
doesn't ensure this alignment it probably shouldn't make the assumption.
Unlike some of the optimized memcpy() implementations that make sure of
the alignment, our code does not by itself ask for copying of 128bit
chunks at a time, so one should assume that the optimization has to be
still safe enough for such a case.

Now what we were wondering is whether this issue ...
* is an actual bug?
* is caused by a violation of the C standard in our code?
* could be solved by using MOVDQU instead?

My superior (also a developer) jumped to the conclusion that this is a
compiler/optimizer error, but I'm hesitant to follow this judgment.

Thanks in advance for any insightful comments,

// Oliver