Hi everyone,
This is a quote from the docs of gcc-3.4.3:
docs> Historically, GCC has not deleted empty loops under the assumption docs> that the most likely reason you would put one in a program is to docs> have a delay
It might have once been a common thing to do, but using empty loops as delays is bad style (there probably are exceptions), most will agree on that. And history is a bad reason to do things the non-optimal way. But this is not my point.
docs> ...so deleting them will not make real programs run any faster. docs> However, the rationale here is that optimization of a nonempty docs> loop cannot produce an empty one, which holds for C but is not docs> always the case for C++.
This is the part I don't get. Why can the optimization of a C loop never yield an empty loop? Take this trivial example:
long f() { long i, ret = 0;
for (i = 0; i < 1000000000; i++) ret++;
return(ret); }
Of course no-one would write this, but it's an example where an obviously non-empty loop gets converted to an empty one after optimization. "gcc-3.4.3 -c -O2" produces (on x86):
f: pushl %ebp movl $999999999, %eax movl %esp, %ebp .L5: decl %eax jns .L5 popl %ebp movl $1000000000, %eax ret
.L5 is an empty loop because the "ret++" has been moved out of the loop and the return value can be set directly. Removing this loop would make the code faster, because looping to a billion is very expensive, and in this situation, the programmer probably didn't intend to cause a delay.
Of course the example above is silly, but there surely are situations where such trivial loops like the one above can result, without that the programmer obviously notices it (for example after macro expansion).
For C++, the docs say that things are different. Here's another example, that is somewhat more realistic. First we define a function template that counts how many numbers in a hardcoded range have a given property (passed as a functor).
template<class F> static long f(F func) { long ret = 0;
for (long l = 0; l < 1000000003; l++) if (func(l)) ret++;
return(ret); }
Such a function template might very well appear in a real life example. Now I call this using a functor that happens to always return true.
class Functor { public: inline bool operator()(long l) { return(true); } };
class Functor my_functor;
long g() { return(f(my_functor)); }
gcc-3.4.3 inlines the template function f<Functor>() in g() and optimizes the loop in a similar way as in the first C example. The result is as follows:
_Z1gv: pushl %ebp movl $999999999, %eax movl %esp, %ebp .L7: subl $4, %eax jns .L7 popl %ebp movl $1000000003, %eax ret
Again, I get an empty loop that delays the execution unecessarily. Though, you'll notice that the loop has been partially unrolled to take steps of 4 in every iteration. If you let the loop go to 1'000'000'000 instead of 1'000'000'003, you'll even get steps of 25. But this would then still mean 40'000'000 iterations, which take about 70ms to execute on my system. Too much for something that does nothing.
Removing loops that were initially empty in the source is one thing. (But removing them would encourage people to write proper delays; and there would still be the work-arounds to keep empty loops.)
IMHO, removing initially non-empty loops that were transformed into empty ones by optimizations is a different thing, which should be done.
docs> Moreover, with -funroll-loops small "empty" loops are already docs> removed, so the current behavior is both sub-optimal and docs> inconsistent and will change in the future.
Ok, so it's sub-optimal and it is known. I have only a very vague understanding of GCC's internals, but removing empty loops can't be very complicated, can it? Under this assumption (correct me if I'm wrong), there must be other reasons to keep them. So what are they?
(btw, I tried gcc-3.4-20041119, with the same results and gcc-4.0-20041121 with IMHO worse results. )
Ps: As a side question: Why does gcc not partially unroll the loop to take several steps per iteration, as g++ did? (Note: gcc-4.0-20041121 didn't do it either.) There are probably options to enable it, but shouldn't this be enabled with -O2?
jlh
Attachment:
signature.asc
Description: OpenPGP digital signature