I'm trying to create some template specializations to combine short movs into larger movs. A simple example might be: struct SPoint { uint16_t x; uint16_t y; inline void operator= (const SPoint& v) { x = v.x; y = v.y; } }; I then try to implement something like this: inline void SPoint::operator= (const SPoint& v) { *reinterpret_cast<uint32_t*>(&x) = *reinterpret_cast<const uint32_t*>(&v.x); } The point structure is very convenient to use and putting hacks like this in results in a very useful code size reduction without having to write everything in assembly. Unfortunately, the above code breaks aliasing rules and doesn't compile. So I'm using the union cast hack to get around it like this: template <typename DEST, typename SRC> inline DEST noalias_cast (SRC s) { asm(""::"g"(s)); union { SRC s; DEST d; } u = {s}; return (u.d); } inline void SPoint::operator= (const SPoint& v) { *noalias_cast<uint32_t*>(&x) = *noalias_cast<const uint32_t*>(&v.x); } This works just fine, except in some cases when optimization is turned on. The optimizer doesn't see what I'm doing, and feels free to rearrange instructions all around it, instantiating points in the wrong place and reading from uninitialized memory. Next I tried an explicit asm touch, just like the one in noalias_cast, and for the same reasons: inline void SPoint::operator= (const SPoint& v) { asm (""::"m"(x),"m"(y)); asm (""::"m"(v.x),"m"(v.y)); *noalias_cast<uint32_t*>(&x) = *noalias_cast<const uint32_t*>(&v.x); asm ("":"=m"(x),"=m"(y)); } This does help in the majority of cases, but sometimes the optimizer still screws things up. One possibility, of course, is to combine the whole thing into a single asm statement. Unfortunately, it does not work in the general case; I have tuples of various sizes of up to eight elements that get the above treatment (see http://ustl.svn.sourceforge.net/viewvc/ustl/trunk/utuple.h?revision=445&view=markup) and the assembler runs out of parameters. Each parameter, even "m"s, count toward register allocation limit, and if you have more than eight arguments, the asm block fails to compile. (That's probably a bug) So my question is, how can I resolve this problem? Is there some way to tell the optimizer to keep the asm blocks together and not to interleave anything between them? volatile doesn't do it. Is there some other way to tie an aliased pointer to the target variable so that the compiler would know what's being accessed? Or any other ideas? -- Mike msharov@xxxxxxxxxxxxxxxxxxxxx