Re: GCC asm block optimizations on x86_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Rask Ingemann Lambertsen wrote:
On Mon, Aug 27, 2007 at 06:11:04AM +0100, Darryl L. Miles wrote:
	#define U64_DIVIDE_ASM(quotient, remainder, dividend, divisor, overflow)	do {	\
		__asm__ __volatile__(						\
			"\n\t"							\
			"xorl %0,%0\n\t"					\
			"divq %5\n\t"						\
										\
			"jnc 1f\n\t"						\
			"incl %0\n"						\
			"1:\n\t"						\
			"movq %%rax,%2\n\t"					\
			"movq %%rdx,%1\n\t"					\
			: "=&g" (overflow),		/* return */		\
			  "=g" (*remainder),					\
			  "=g" (*quotient)					\
			: "d" (0),			/* argument */		\
			  "a" ((*dividend)),					\
			  "g" ((*divisor))					\
			/*: "rax", "rdx", you'd think you need this to */	\
			/* describe these registers as no longer containing */	\
			/* the assigned input values after asm block */		\
			/* execution, but will not compile witht them set. */	\

   I think you want

: "=&r" (overflow),	/* return */
  "=d" (*remainder),
  "=a" (*quotient)
: "1" (0),		/* argument */
  "2" (*dividend),
  "rm" (*divisor)

so the compiler knows that %rax and %rdx are modified.

Yes and when doing that I can remove the two "movq" insns from the asm block as the compiler will generate them for me.

I also kept the double parenthesis due to the macro, it seems (*(dividend)) is correct, the idea is to allow for (*(&foo->bar.fubar)). I also made overflow an input constrint initilized to the value zero and the most perfect __asm__ block I could churn out AFAIKS became:


pseudo-prototype: extern void U64_DIVIDE_ASM(u_int64_t *quotient, u_int64_t *remainder, const u_int64_t *dividend, const u_int64_t *divisor, int &overflow);

#define U64_DIVIDE_ASM(quotient, remainder, dividend, divisor, overflow) do { \
                __asm__ __volatile__(                            \
                        "\n\t"                                   \
                        "divq %6\n\t"                            \
                                                                 \
                        "jnc 1f\n\t"                             \
                        "inc %0\n"                               \
                        "1:\n\t"                                 \
                        : "=r" (overflow),              /* return */   \
                          "=d" (*(remainder)),                         \
                          "=a" (*(quotient))                           \
                        : "0" (0),                                     \
                          "1" (0),                      /* argument */ \
                          "2" (*(dividend)),                           \
                          "rm" (*(divisor))                            \
                        /*: "rax", "rdx"*/      /* no side effects */  \
                );                                                     \
        } while(0)

This gives the compiler the most options for code-gen. Results in a free standing function of:

u64_divide:
        movq    (%rdi), %rdi
        xorl    %r8d, %r8d
        movq    %rdx, %r9
        movl    %r8d, %edx
        movq    %rdi, %rax
#APP

        divq (%rsi)
        jnc 1f
        inc %r8d
1:

#NO_APP
        movq    %rdx, (%rcx)
        movq    %rax, (%r9)
        movl    %r8d, %eax
        ret


But this doesn't demonstrate the original possibilities available to the compiler that the compiler didn't see which the constraints allowed for in the original example.

That indicates to me a clear test case to make an improvement upon; then what else might improve as a result of that work. Maybe the problem is that GCC treats the "Setup of inputs" and "Allocation of extra registers" as a single phase to be done together so in the example it did not see that %r8d was free for use because it was made busy in helping initialize the input %rdx.

Thank you for your interest in this matter.


Darryl

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux