Rask Ingemann Lambertsen wrote:
On Mon, Aug 27, 2007 at 06:11:04AM +0100, Darryl L. Miles wrote:
#define U64_DIVIDE_ASM(quotient, remainder, dividend, divisor, overflow) do { \
__asm__ __volatile__( \
"\n\t" \
"xorl %0,%0\n\t" \
"divq %5\n\t" \
\
"jnc 1f\n\t" \
"incl %0\n" \
"1:\n\t" \
"movq %%rax,%2\n\t" \
"movq %%rdx,%1\n\t" \
: "=&g" (overflow), /* return */ \
"=g" (*remainder), \
"=g" (*quotient) \
: "d" (0), /* argument */ \
"a" ((*dividend)), \
"g" ((*divisor)) \
/*: "rax", "rdx", you'd think you need this to */ \
/* describe these registers as no longer containing */ \
/* the assigned input values after asm block */ \
/* execution, but will not compile witht them set. */ \
I think you want
: "=&r" (overflow), /* return */
"=d" (*remainder),
"=a" (*quotient)
: "1" (0), /* argument */
"2" (*dividend),
"rm" (*divisor)
so the compiler knows that %rax and %rdx are modified.
Yes and when doing that I can remove the two "movq" insns from the asm
block as the compiler will generate them for me.
I also kept the double parenthesis due to the macro, it seems
(*(dividend)) is correct, the idea is to allow for (*(&foo->bar.fubar)).
I also made overflow an input constrint initilized to the value zero
and the most perfect __asm__ block I could churn out AFAIKS became:
pseudo-prototype: extern void U64_DIVIDE_ASM(u_int64_t *quotient,
u_int64_t *remainder, const u_int64_t *dividend, const u_int64_t
*divisor, int &overflow);
#define U64_DIVIDE_ASM(quotient, remainder, dividend, divisor,
overflow) do { \
__asm__ __volatile__( \
"\n\t" \
"divq %6\n\t" \
\
"jnc 1f\n\t" \
"inc %0\n" \
"1:\n\t" \
: "=r" (overflow), /* return */ \
"=d" (*(remainder)), \
"=a" (*(quotient)) \
: "0" (0), \
"1" (0), /* argument */ \
"2" (*(dividend)), \
"rm" (*(divisor)) \
/*: "rax", "rdx"*/ /* no side effects */ \
); \
} while(0)
This gives the compiler the most options for code-gen. Results in a
free standing function of:
u64_divide:
movq (%rdi), %rdi
xorl %r8d, %r8d
movq %rdx, %r9
movl %r8d, %edx
movq %rdi, %rax
#APP
divq (%rsi)
jnc 1f
inc %r8d
1:
#NO_APP
movq %rdx, (%rcx)
movq %rax, (%r9)
movl %r8d, %eax
ret
But this doesn't demonstrate the original possibilities available to the
compiler that the compiler didn't see which the constraints allowed for
in the original example.
That indicates to me a clear test case to make an improvement upon; then
what else might improve as a result of that work. Maybe the problem is
that GCC treats the "Setup of inputs" and "Allocation of extra
registers" as a single phase to be done together so in the example it
did not see that %r8d was free for use because it was made busy in
helping initialize the input %rdx.
Thank you for your interest in this matter.
Darryl