Re: GCC asm block optimizations on x86_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Aug 28, 2007 at 11:02:49PM +0100, Darryl Miles wrote:

> Thanks for the note on the peephole, can the peephole substitute
> sequences when there is overlapping lifetimes of various processor 
> features.  For example the 'flags' bits, you can't peephole a sequence 
> that does a compare (setting flag bits) then loads a register with zero 
> (not affecting flag bits) then does a branch based on flag bits, 
> replacing the loads a register with zero with 'xor' on i386 would 
> destroy the flags.

   Peephole definitions check for cases like this and won't do the
optimization clobbering the flags register if the flags register is live at
that point.

>  0000000000000090 <u64_divide>:
>    00:   49 89 d1                mov    %rdx,%r9	<<- [1] save %rdx in
> %r9 for arg-as-return
>    03:   48 8b 07                mov    (%rdi),%rax
>    06:   ?? ?? ??                xor    %edx,%edx	<<- implicit zero of
> high 32bits, would accept xorq %rdx,%rdx

   Right, that's why I suggest using "gcc -S -dp" because then it clearly
shows if it's a 32-bit (*movsi_xxx) or a 64-bit (*movdi_xxx) instruction (as
seen from GCC's point of view, since the actual CPU instruction is the same
in this and several other cases).

>    09:   ?? ??                   xor    %r8d,%r8d

   Likewise.

>    0b:   48 f7 36                divq   (%rsi)
>    0e:   73 02                   jae    12 <u64_divide+0x12>
>    10:   ?? ??                   inc    %r8d

   Can't you substitute the "jae; inc %r8d" sequence with "adcl $0, %r8d"?

>    12:   49 89 01                mov    %rax,(%r9)	<<- [1] use saved
> %rdx to return argument
>    15:   48 89 11                mov    %rdx,(%rcx)
>    18:   ?? ??                   mov    %r8d,%eax
>    1a:   c3                      retq

> I also did not say which version of GCC I was using, it was 4.0.2, but 
> I've just tried with 4.2.1 and the same code is generated, although -O6 
> appears to try and inline things further which lead me to find an 
> invalid constraint "g" ((*divisor)) should be "r" ((*divisor)).  Since 
> it tried to use a constant, although a register or memory via indirected 
> register is valid here.

   You can use "rm" for such a constraint.

> Another concern that occurs to me is that if the __asm__ constraints are 
> not 100% perfect is there anyway to test/permutate every possible way 
> for the compiler might generate the code.

   I suppose you could write a script which outputs "calls" to the asm
construct with a constant, local variable (which we assume will end up in a
register) or global variable for each operand in turn, then try compiling
and assembling (i.e. -c) the resulting code.

-- 
Rask Ingemann Lambertsen

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux