tom@xxxxxxxxxx writes: > What's the right way of writing the 'compute a*b%c' intrinsic? > > I can do > > #define mulmod(a,b,c,d,e) { asm ("mul %%rbx; div %%rsi;" : "=d" (d), > "=a" (e) : "a" (a), "b" (b), "S" (c));} > > but that causes quite a lot of ugly register-shuffling > > If I do > > #define mulmod(a,b,c,d,e) { asm ("mul %0; div %1;" : "=d" (d), "=a" > (e) : "a" (a), "r" (b), "r" (c));} > > then the program doesn't work; disassembling > > u64 zul(u64 a, u64 b, u64 p) > { > u64 x,y; > mulmod(a,b,p,x,y); > return x; > } > > gives > > 00000000004005b0 <_Z3zulyyy>: > 4005b0: 48 89 f8 mov %rdi,%rax > 4005b3: 48 f7 e2 mul %rdx > 4005b6: 48 f7 f0 div %rax > 4005b9: 48 89 d0 mov %rdx,%rax > 4005bc: c3 retq > > which is clearly crazy since %rax from 4005b0 will have been > overwritten by the mul command before it's used. #define mulmod(a,b,c,d,e) \ { \ asm ("mul %3\n\tdiv %4\n" \ : "=&d" (d), "=a" (e) \ : "1" (a), "g" (b), "g" (c)); \ } Andrew.