Maybe I should be posting this on the s390 listserv but this *seems* more like a gcc issue so here goes... Compiling for s390x, register constraints don't allow specification of specific registers as with x86. But, because certain instructions work on pairs of consecutive registers, it's essential to be able to do this. The classic gcc hack to facilitate this is to declare a variable with the register attribute and an asm() clause to indicate a specific register, as in register uint64_t _foo asm("2") = something; which then induces the asm block to use register 2 for _foo. This works great when done inside a function inside a struct or class but if I change the struct or class to use a template the compiler stops paying attention to my register requests. This is true even when I don't use the template type for anything (though in my real app I want to). The program below illustrates the problem. If I remove the "template<typename T>" and the template parms from the DPointer variable declarations it all works great. Using a template, I get an error from the assembler: "Fatal error: odd numbered general purpose register specified as register pair" because cdsg expects an even numbered registers for its operands and gcc has chosen to give it an odd register instead of the even ones I asked for. With sufficient fiddling I can make the compile-time error go away but it's well nigh impossible to get gcc to cooperate and put the values into the all registers I need when using a template so the best I can get is a runtime error. Anyone have any thoughts/ideas on this? It seems to walk and quack like a bug but maybe there are caveats/tricks for templates and asm? FWIW, I'm running version "(SUSE Linux) 4.8.5" on s390x, of course, as I think that's the preferred compiler for s390x. I could probably wrangle a newer one into place but I didn't see any reported issues/fixes that looked remotely like this problem so not sure it's worth the effort. In any case, here's the code that demonstrates the problem (yes, it's silly and is only intended as a chopped down demonstration of the problem): #include <stdint.h> #include <stdio.h> #include <string.h> template<typename T> class DPointer { public: uint64_t ui[2]; bool cas(DPointer const& nval, DPointer const& cmp) { bool result; { register uint64_t _old0 asm("2") = cmp.ui[0]; register uint64_t _old1 asm("3") = cmp.ui[1]; register uint64_t _new0 asm("4") = nval.ui[0]; register uint64_t _new1 asm("5") = nval.ui[1]; asm __volatile__ ( "cdsg %2,%4,%1\n" "ipm %0\n" "srl %0,28\n" : "=d" (result), "+m" (this->ui), "+d" (_old0), "+d" (_old1) : "d" (_new0), "d" (_new1) : "cc" ); } return !result; } } __attribute__ (( __aligned__( 16 ) )); static DPointer<uint64_t> anchor = {0, 0}; int main(int argc, char* argv[]) { uint64_t o1, o2, n1, n2; DPointer<uint64_t> oldAnchor = {anchor.ui[0], anchor.ui[1]}; DPointer<uint64_t> newAnchor = {oldAnchor.ui[0] + 1, oldAnchor.ui[1] + 2}; bool result = anchor.cas(newAnchor, oldAnchor); printf("%s: %ld %ld\n", result ? "true" : "false", anchor.ui[0], anchor.ui[1]); return 0; } Thanks --- Alex Kodat Rocket Software