What's the right way of writing the 'compute a*b%c' intrinsic? I can do #define mulmod(a,b,c,d,e) { asm ("mul %%rbx; div %%rsi;" : "=d" (d), "=a" (e) : "a" (a), "b" (b), "S" (c));} but that causes quite a lot of ugly register-shuffling If I do #define mulmod(a,b,c,d,e) { asm ("mul %0; div %1;" : "=d" (d), "=a" (e) : "a" (a), "r" (b), "r" (c));} then the program doesn't work; disassembling u64 zul(u64 a, u64 b, u64 p) { u64 x,y; mulmod(a,b,p,x,y); return x; } gives 00000000004005b0 <_Z3zulyyy>: 4005b0: 48 89 f8 mov %rdi,%rax 4005b3: 48 f7 e2 mul %rdx 4005b6: 48 f7 f0 div %rax 4005b9: 48 89 d0 mov %rdx,%rax 4005bc: c3 retq which is clearly crazy since %rax from 4005b0 will have been overwritten by the mul command before it's used. Is there a good way of encoding 'any register other than rax or rdx'? Many thanks in advance, Tom