Re: Superblock Instruction Scheduling in GCC

Vladimir Makarov <vmakarov@xxxxxxxxxx> · Thu, 04 Dec 2003 10:54:49 -0500

Jan Hubicka wrote:
> 
> >
> > As far as experimentation is concerned, let me give some background about
> > what I am doing and what kind of input I might be able to provide:
> >
> > I am doing research on optimal superblock scheduling and I need to import
> > superblocks (more precisely, superblock data dependence graphs) from gcc to
> > run them through my optimal solver (my research group currently has a way
> > to import basic blocks and I am trying to extend that to superblocks).
> > Even though this optimal solver is currently too slow to be included in a
> > production compiler like gcc, it will be useful for studying the quality
> > of gcc's schedules by comparing them against optimal. It will probably
> > take me two or three months to get to that point for the very simplistic
> > machine models that we are working with, but I'll be more than happy to
> > provide you with any interesting results that I might come up with.
> 
> Yes, this looks interesting.  I would be curious how much gap there is
> in between current scheduling and optimal one deifnitly.
> On GCC mainline, you may also want to experiment with multipass
> scheduling that (to my understanding) given large enought limits should
> produce optimal schedules, but this is Vladimir's domain.
> 

  Actually the first cycle multipass instruction scheduling tries only
to maximize number of insns issued on the current cycle.  An optimal
insn scheduling should minimize execution time of scheduled code which
means it might require to issue not maximal number of insns on a cycle. 
Also the first cycle multipass insn scheduling solves the task in a
primitive way (just trying part of permutations of ready insns without
reusing already evaluated information from other tried permutation). 
This algorithm was just a demonstration of fast work DFA hazard
recognizer.  It improves code for Itanium (as I remember about 2% for
SPECINT2000).

  As for optimal insn scheduling, I wrote a prototype for BB 2-3 years
ago.  It was based on a dynamic programing approach (reusing already
evaluated information about scheduled insn subsequences).  It worked
fast for sequences up to 15-20 insns for ultrasparc.  I see some
improvements (say 7 cycles instead of 9 cycles) for about 30% of bbs on
small benchmarks.  I see the following problem with this approach for
gcc as an industrial compiler:

   1. Gcc has a lot of scheduler hooks which are oriented to list insn
scheduling algorithm.  Many ports uses them.  It is practically
impossible to embed these hooks into an optimal insn scheduling
algorithm.

   2. Long sequences of insns should be divided to keep the algorithm
speed acceptable.  Simple approach to this might work even worse than
list insn scheduling.  Some information (like when data from previous
subsequence will be ready) should be transferred from one subsequence to
another sequence.

  3.  Modern insn scheduling algorithms are now not only about insn
rearranging (in linear code) but can include many other transformations
like code cloning, predication, data/control flow speculation.  I don't
see how these transformations could be a part of an optimal insn
scheduling).  And I think there is more code improving potential using
these transformations than an optimal scheduling.  Although there is
definitely some areas where an optimal scheduling could be useful.

In any case, it would be very interesting to see the results of your
research.  We should encourage people to make researches on gcc.

Vlad