Empty loops

jlh <jlh@xxxxxx> · Wed, 24 Nov 2004 15:18:41 +0100

Hi everyone,

This is a quote from the docs of gcc-3.4.3:

docs> Historically, GCC has not deleted empty loops under the assumption
docs> that the most likely reason you would put one in a program is to
docs> have a delay

It might have once been a common thing to do, but using empty loops as
delays is bad style (there probably are exceptions), most will agree on
that.  And history is a bad reason to do things the non-optimal way.
But this is not my point.

docs> ...so deleting them will not make real programs run any faster.
docs> However, the rationale here is that optimization of a nonempty
docs> loop cannot produce an empty one, which holds for C but is not
docs> always the case for C++.

This is the part I don't get.  Why can the optimization of a C loop
never yield an empty loop?  Take this trivial example:

long f()
{
        long i, ret = 0;

        for (i = 0; i < 1000000000; i++)
                ret++;

        return(ret);
}

Of course no-one would write this, but it's an example where an
obviously non-empty loop gets converted to an empty one after
optimization.  "gcc-3.4.3 -c -O2" produces (on x86):

f:      pushl   %ebp
        movl    $999999999, %eax
        movl    %esp, %ebp
.L5:    decl    %eax
        jns     .L5
        popl    %ebp
        movl    $1000000000, %eax
        ret

.L5 is an empty loop because the "ret++" has been moved out of the loop
and the return value can be set directly.  Removing this loop would make
the code faster, because looping to a billion is very expensive, and in
this situation, the programmer probably didn't intend to cause a delay.

Of course the example above is silly, but there surely are situations
where such trivial loops like the one above can result, without that the
programmer obviously notices it (for example after macro expansion).

For C++, the docs say that things are different.  Here's another
example, that is somewhat more realistic.  First we define a function
template that counts how many numbers in a hardcoded range have a given
property (passed as a functor).

template<class F>
static long f(F func)
{
        long ret = 0;

        for (long l = 0; l < 1000000003; l++)
                if (func(l))
                        ret++;

        return(ret);
}

Such a function template might very well appear in a real life example.
Now I call this using a functor that happens to always return true.

class Functor
{
public:
        inline bool operator()(long l)
        {
                return(true);
        }
};

class Functor my_functor;

long g()
{
        return(f(my_functor));
}

gcc-3.4.3 inlines the template function f<Functor>() in g() and
optimizes the loop in a similar way as in the first C example.  The
result is as follows:

_Z1gv:  pushl   %ebp
        movl    $999999999, %eax
        movl    %esp, %ebp
.L7:    subl    $4, %eax
        jns     .L7
        popl    %ebp
        movl    $1000000003, %eax
        ret

Again, I get an empty loop that delays the execution unecessarily.
Though, you'll notice that the loop has been partially unrolled to take
steps of 4 in every iteration.  If you let the loop go to 1'000'000'000
instead of 1'000'000'003, you'll even get steps of 25.  But this would
then still mean 40'000'000 iterations, which take about 70ms to execute
on my system.  Too much for something that does nothing.

Removing loops that were initially empty in the source is one thing.
(But removing them would encourage people to write proper delays; and
there would still be the work-arounds to keep empty loops.)

IMHO, removing initially non-empty loops that were transformed into
empty ones by optimizations is a different thing, which should be done.

docs> Moreover, with -funroll-loops small "empty" loops are already
docs> removed, so the current behavior is both sub-optimal and
docs> inconsistent and will change in the future.

Ok, so it's sub-optimal and it is known.  I have only a very vague
understanding of GCC's internals, but removing empty loops can't be very
complicated, can it?  Under this assumption (correct me if I'm wrong),
there must be other reasons to keep them.  So what are they?

(btw, I tried gcc-3.4-20041119, with the same results and
gcc-4.0-20041121 with IMHO worse results. )

Ps: As a side question: Why does gcc not partially unroll the loop to
take several steps per iteration, as g++ did?  (Note: gcc-4.0-20041121
didn't do it either.)  There are probably options to enable it, but
shouldn't this be enabled with -O2?

jlh

Attachment:
signature.asc

Description: OpenPGP digital signature