* Erik Faye-Lund <kusmabite@xxxxxxxxx> wrote: > Thanks. I also timed on my end (on Windows), and I came to the same > conclusion (but the improvements of your original was somewhat smaller in my > end; could be due to the test-case). It seems like the early-out wasn't the > only reason your original patch performed faster. It could be that memcmp > (probably) didn't get inlined, and the extra function call outweighs the > complexity. [...] Function calls arent that heavy really. My measurements identified the following effects: - profiling of stalled cycles clearly pinpointed the REP MOV string instruction. - the patched code had less branch-misses - the clearer and inlined open-coded loop is probably easier for the CPU to speculate along - while REP MOV string ops are 'opaque' and the result might be harder to speculate. So i think the main benefit of my patch is that it avoids the REP MOV instruction. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html