Re: GCC asm block optimizations on x86_64

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 29, 2007 at 12:11:22PM +0100, Darryl Miles wrote:
> Rask Ingemann Lambertsen wrote:
> >On Tue, Aug 28, 2007 at 11:02:49PM +0100, Darryl Miles wrote:
> >   Peephole definitions check for cases like this and won't do the
> >optimization clobbering the flags register if the flags register is live at
> >that point.
> 
> So I take it that peephole works by knowing the instructions emitted 
> with annotations about the lifetimes of registers / flags / other useful 
> stuff to help it.  I was thinking it was a bit more blind to things than 
> that.

   The peephole2 pass tries to match one or more instructions in GCC's RTL
format against a template, and if successful, replaces those instructions by
one or more new ones. You can include a condition for the replacement as
well as requiring a scratch register to be available. In this particular
case, the condition is that the flags registers isn't live.

> I did not understand the relevance to knowing if it is (*movsi_xxx) or 
> (*movdi_xxx).  From my point of view knowing that would not alter the 2 
> original points I was making [1] and [3].  Maybe there is some 
> pipelining (or other complex) issue I don't know about which makes the 
> emitted code better than what I'm suggesting.

   No, it's just that debugging problems with poor code, it's best to know
exactly what the compiler thinks it's generating.
 
> Recapping on the original issues:
> 
> [1] failure to treat setting a register to the value of zero as a 
> special case (since there maybe many ways to achieve this on a given 
> CPU, different methods have different trades, insn length, unwanted side 
> effects) which may allow this operation a lot of freedom for moving / 
> scheduling.

> [3] usage of %ebx when %r8d would have been a better choice, at the time 
> %ebx is needed to be allocated the lifetime of the temporary use of %r8d 
> was over.  i.e. allocating of registers which form outputs but not 
> inputs should take place last thing (at the moment of #APP) maybe by 
> doing this %r8d would have been a candidate ?  which would negate the 
> need for the push/pop's.

   GCC's register allocator isn't as good as we'd like it to be. I think
it's causing both problems.

> Thanks for your thoughts.  Maybe I am just expecting too much.

   I don't think so.

-- 
Rask Ingemann Lambertsen

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux