Re: Question on strict aliasing in C.

David Brown <david@xxxxxxxxxxxxxxx> · Tue, 05 Jul 2011 10:08:03 +0200

On 04/07/2011 18:57, Georg-Johann Lay wrote:
Hi,

this it not a question on GCC, so I apologize for asking a question on
C strict aliasing rules here. As I know that some people reading this
list are much more familiar with C standard than I am, allow me to ask
that question, anyway.

Suppose the following C code that tries to implement the standard
copysign function, i.e. copy the sign of y into x and return that
value. double be 64 bits wide:

#define MASK ((unsigned short) 0x8000)

double copysign (double x, double y)
{
     unsigned short * const px = (unsigned short*)(char*)&x + 3;
     unsigned short * const py = (unsigned short*)(char*)&y + 3;

     *px = *px&  ~MASK | *py&  MASK;
     return x;
}

While I say that this code is not correct because it breaks C's strict
aliasing rules (e.g C89/90, Chapter 6.3; C98/99, Chapter 6.5, Clause
7), some other person very well familiar with the standard claims that
is correct and no problem.

So I want to reassure me if the code is ok or not.

I am aware of gcc's -f[no]-strict-aliasing switch, but that's rather
technical detail to cope with such presumably incorrect code.

Thanks,

Johann

BTW, gcc does just what I would expect: it returns x unchanged and
that is not an "optimizer bug".

Does it get compiled to an empty function?

Anyway, you are correct that strict aliasing means that this function is 
buggy - the compiler can assume that a "double" such as x or y cannot 
share the same address as a "short" (or "unsigned short"), thus the 
pointer "px" cannot point to x, thus x is not changed by the function, 
and can be returned unchanged.

The extra "(char*)" in the cast may be an attempt to work around 
aliasing, but it won't work.

In other words, the function is buggy and unclear - never a good 
combination.  And for the 8-bit AVR (I'm guessing that's the target you 
are interested in...), it is inefficient as well.

The code assumes 8-byte doubles, 16-bit shorts and little-endian 
ordering.  Obviously that's a common combination, but it's not the only 
possibility - and IIRC the AVR uses 32-bit doubles at the moment (and 
some embedded targets allow 32-bit doubles as an option for efficiency).

It would be /far/ better to stick to char pointers - these may alias 
anything, and are more efficient on an 8-bit cpu (without being /too/ 
bad on most other targets).  The alternative would be to use union types 
to do the casting - that would make more sense on 32-bit RISC targets 
without direct byte operations (such as earlier ARMs).

double copysign (double x, double y)
{
     unsigned char * const px = (unsigned char*)(&x + 1) - 1;
     const unsigned char * const py = (unsigned char*)(&y + 1) - 1;

     *px = (*px & ~0x80) | (*py & 0x80);
     return x;
}

This should make px and py point to the last bytes of x and y, 
regardless of the size of a double.  I've added a couple more brackets, 
and removed the MASK macro - that's just wanton name-space pollution.

It's still little-endian only, of course.