We can optimize the pickaxe search by removing the lines at the beginning and at the end that are common between the preimage and the postimage, and counting the occurrences of the "needle" in the haystack that is now made smaller. One thing that we need to be careful about is that we should not remove the common part in full. We need to keep at least the same length as the "needle" from the common part, so that we can detect the differences in a case like this: preimage: common part then needle appears in this file postimage: common part then needlX appears in this file If we literally remove the common leading part up to "needl" and start counting from the byte after that, we will miss that preimage has an occurrence of "needle" but postimage does not. With this optimization in place, the following query in the Linux kernel repository on my machine becomes about 40% faster: $ STRING='Ensure that the real time constraints are schedulable.' $ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null (Before the patch, best of 5 runs) 5.59user 0.15system 0:05.74elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+39956minor)pagefaults 0swaps (After the patch, best of 5 runs) 3.04user 0.17system 0:03.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k 0inputs+0outputs (0major+49697minor)pagefaults 0swaps Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx> --- diffcore-pickaxe.c | 54 ++++++++++++++++++++++++++++++++++++++++++++++++++- 1 files changed, 52 insertions(+), 2 deletions(-) diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c index 4a0dca1..b447d75 100644 --- a/diffcore-pickaxe.c +++ b/diffcore-pickaxe.c @@ -68,9 +68,59 @@ static int has_different_matches(struct diff_filepair *p, const char *needle, unsigned long len, regex_t *regexp) { - return (count_match(0, p->one, needle, len, regexp) - != count_match(0, p->two, needle, len, regexp)); + const char *one, *two; + unsigned long one_size, two_size, common, trim; + int differs; + if (regexp) + return (count_match(0, p->one, needle, len, regexp) + != count_match(0, p->two, needle, len, regexp)); + if (diff_populate_filespec(p->one, 0) || + diff_populate_filespec(p->two, 0)) + return 0; + + one = p->one->data; + two = p->two->data; + one_size = p->one->size; + two_size = p->two->size; + common = (one_size < two_size) ? one_size : two_size; + + /* Skip the initial common part */ + for (trim = 0; + trim < common && one[trim] == two[trim]; + trim++) + ; + + /* Start comparing at (len-1) bytes before the first difference */ + if (len <= trim) { + trim -= len; + one += trim; + two += trim; + common -= trim; + one_size -= trim; + two_size -= trim; + } + + /* Trim the common part at the end */ + for (trim = 0; + trim < common && one[one_size - trim - 1] == two[two_size - trim - 1]; + trim++) + ; + + /* Leave (len-1) bytes */ + if (len <= trim) { + trim -= len; + one_size -= trim; + two_size -= trim; + } + + differs = (count_match_internal(0, one, one_size, needle, len, regexp) != + count_match_internal(0, two, two_size, needle, len, regexp)); + + diff_free_filespec_data(p->one); + diff_free_filespec_data(p->two); + + return differs; } static int pickaxe_match_one(struct diff_filepair *p, -- 1.6.2.rc2.91.gf9a36 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html