Tim Prince a écrit :
...
Also note that gcc 4.3 with -O2 is smart enough to recognise that
neither of these loops do anything and remove them both entirely:
I don't have sufficient memory to recall whether the gcc version,
hardware, or compile flags information was in the original post. If you
are running on a machine where the sequence dec ebx; jns is subject to a
partial flags stall, that might be an explanation. If you used an up to
date gcc (evidently not), and wrote a loop which was not trivial to
remove (evidently not), and got such code when specifying -march for a
machine which doesn't like that sequence, this would be a bug for which
you should file a PR. Note that dec, rather than add -1, has been a
standard way to write code which might be OK on AMD but dead slow on
Intel, since 6 years ago.
Thanks for your explanations. I have tried with option -march=pentium4 but
there is no improvment in this case. I will try again when I install gcc 4.x
Sebastien