[PATCH 4/4] diffcore-pickaxe: optimize by trimming common initial and trailing parts

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We can optimize the pickaxe search by removing the lines at the beginning
and at the end that are common between the preimage and the postimage, and
counting the occurrences of the "needle" in the haystack that is now made
smaller.

One thing that we need to be careful about is that we should not remove
the common part in full.  We need to keep at least the same length as the
"needle" from the common part, so that we can detect the differences in a
case like this:

  preimage: common part then needle appears in this file
 postimage: common part then needlX appears in this file

If we literally remove the common leading part up to "needl" and start
counting from the byte after that, we will miss that preimage has an
occurrence of "needle" but postimage does not.

With this optimization in place, the following query in the Linux kernel
repository on my machine becomes about 40% faster:

$ STRING='Ensure that the real time constraints are schedulable.'
$ git log -S"$STRING" HEAD -- kernel/sched.c >/dev/null

(Before the patch, best of 5 runs)
5.59user 0.15system 0:05.74elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+39956minor)pagefaults 0swaps

(After the patch, best of 5 runs)
3.04user 0.17system 0:03.23elapsed 99%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+49697minor)pagefaults 0swaps

Signed-off-by: Junio C Hamano <gitster@xxxxxxxxx>
---
 diffcore-pickaxe.c |   54 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 files changed, 52 insertions(+), 2 deletions(-)

diff --git a/diffcore-pickaxe.c b/diffcore-pickaxe.c
index 4a0dca1..b447d75 100644
--- a/diffcore-pickaxe.c
+++ b/diffcore-pickaxe.c
@@ -68,9 +68,59 @@ static int has_different_matches(struct diff_filepair *p,
 				 const char *needle, unsigned long len,
 				 regex_t *regexp)
 {
-	return (count_match(0, p->one, needle, len, regexp)
-		!= count_match(0, p->two, needle, len, regexp));
+	const char *one, *two;
+	unsigned long one_size, two_size, common, trim;
+	int differs;
 
+	if (regexp)
+		return (count_match(0, p->one, needle, len, regexp)
+			!= count_match(0, p->two, needle, len, regexp));
+	if (diff_populate_filespec(p->one, 0) ||
+	    diff_populate_filespec(p->two, 0))
+		return 0;
+
+	one = p->one->data;
+	two = p->two->data;
+	one_size = p->one->size;
+	two_size = p->two->size;
+	common = (one_size < two_size) ? one_size : two_size;
+
+	/* Skip the initial common part */
+	for (trim = 0;
+	     trim < common && one[trim] == two[trim];
+	     trim++)
+		;
+
+	/* Start comparing at (len-1) bytes before the first difference */
+	if (len <= trim) {
+		trim -= len;
+		one += trim;
+		two += trim;
+		common -= trim;
+		one_size -= trim;
+		two_size -= trim;
+	}
+
+	/* Trim the common part at the end */
+	for (trim = 0;
+	     trim < common && one[one_size - trim - 1] == two[two_size - trim - 1];
+	     trim++)
+		;
+
+	/* Leave (len-1) bytes */
+	if (len <= trim) {
+		trim -= len;
+		one_size -= trim;
+		two_size -= trim;
+	}
+
+	differs = (count_match_internal(0, one, one_size, needle, len, regexp) !=
+		   count_match_internal(0, two, two_size, needle, len, regexp));
+
+	diff_free_filespec_data(p->one);
+	diff_free_filespec_data(p->two);
+
+	return differs;
 }
 
 static int pickaxe_match_one(struct diff_filepair *p,
-- 
1.6.2.rc2.91.gf9a36

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]

  Powered by Linux