Per Jessen wrote: > > No, but Stephen mentioned that strcmp() "uses a do/while loop". > > On a modern CPU, branches don't normally take any CPU cycles. > > I would have thought everything takes CPU cycles, modern CPU or not? The Intel architecture from the Pentium Pro onwards has a great deal of parallelism. It can commence and complete up to 3 instructions per CPU cycle, but in practice it will almost never be able to sustain that rate continuously. Adding another instruction will only require additional CPU cycles if the instruction delays processing of subsequent instructions. In particular, branch instructions are dealt with by dedicated logic circuitry which does nothing but process branch instructions. This enables speculative execution to work handle branches even when the calculation of the branch condition hasn't completed. The end result is that the only difference between a loop and an unrolled loop is that the unrolled loop results in the branch processing logic remaining idle. More generally, duplicating blocks of code (e.g. unrolling loops or having multiple specialised versions of a routine instead of one generalised version) is usually a net loss on modern architectures, as cache coherence (particularly for code) has a far greater impact than the total number of instructions executed, due to the fact that the CPU is much faster than the RAM. The actual cost of a code cache miss varies depending upon the relative speed of the CPU and RAM, but 400 cycles is typical. You would need to have a lot of additional instructions before their cost outweighs that of a cache miss. -- Glynn Clements <glynn@xxxxxxxxxxxxxxxxxx> - To unsubscribe from this list: send the line "unsubscribe linux-c-programming" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html