Hello,
I have created a 32-bit type called Coords, consisting of 2 16-bit
values x and y. Then I created an ordering functor for use with standard
containers. It just compares the 32-bit values. This should be the
fastest way to do it. See this code:
#include <SDL_types.h>
struct Coords {
struct ordering_functor
{bool operator()(const Coords a, const Coords b) const;};
int x : 16, y : 16;
};
bool Coords::ordering_functor::operator()(const Coords a, const Coords
b) const {
return
*reinterpret_cast<const Uint32 * const>(&a)
<
*reinterpret_cast<const Uint32 * const>(&b);
}
This works great for me on Pentium Mobile and gcc-3.4.5, but someone
else with athlon64 dual core and gcc-4.0.3 has problems. It works for
him without optimization, but fails with optimization. I asked him for
the assembly output of a slow build and an optimized build. This is what
he sent me:
SLOW:
______________________________________________________________________________________
.LFB2:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
leal 12(%ebp), %eax
movl (%eax), %edx
leal 16(%ebp), %eax
movl (%eax), %eax
cmpl %eax, %edx
setb %al
movzbl %al, %eax
popl %ebp
ret
______________________________________________________________________________________
OPTIMIZED:
______________________________________________________________________________________
.LFB2:
pushl %ebp
.LCFI0:
movl %esp, %ebp
.LCFI1:
movl 16(%ebp), %eax
cmpl %eax, 12(%ebp)
popl %ebp
setb %al
movzbl %al, %eax
ret
______________________________________________________________________________________
What should be done about this? (I can not use a union, because I need to have a constructor for Coords.)
thanks in advance,
Erik