Re: Set up for better tree diff optimizations

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Junio C Hamano <junkio@xxxxxxx> writes:

> Linus Torvalds <torvalds@xxxxxxxxxxxxxxxxxxxx> writes:
>
>> On Wed, 21 Mar 2007, Junio C Hamano wrote:
>> ...
>>> Would something like this suit your taste?
>>
>> This looks fine, although the reason I didn't get it done myself is that I 
>> have this nagging feeling that there must be some clever way to make it 
>> even faster. I hated having to do the "strncmp()" early, when it's not 
>> always needed. 
> ...
> We _could_ check never_interesting first and if it is already
> dropped then defer strncmp() and check the pathlen > matchlen
> comparison first.

So this is the round #2, as a replacement patch of what I sent
earlier.  Once we know that there are pathspecs that sort later
than the current path, we can defer doing strncmp() and skip
that pathspec early by comparing length.  If the path is longer
than the spec, we can tell that it would never match without
comparing.

Best time for "git-rev-list HEAD -- arch/i386/ include/asm-i386/"
are (ten runs each):

without any patch:	1.17user
with the previous:	1.13user
with this patch:	1.11user

-- >8 --
Teach tree_entry_interesting() that the tree entries are sorted.

When we are looking at a tree entry with pathspecs, if all the
pathspecs sort strictly earlier than the entry we are currently
looking at, there is no way later entries in the same tree would
match out pathspecs, because the entries are sorted.

Signed-off-by: Junio C Hamano <junkio@xxxxxxx>
---

 tree-diff.c |   55 +++++++++++++++++++++++++++++++++++++++++++++++++------
 1 files changed, 49 insertions(+), 6 deletions(-)

diff --git a/tree-diff.c b/tree-diff.c
index 3678805..15fd665 100644
--- a/tree-diff.c
+++ b/tree-diff.c
@@ -81,6 +81,7 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
 	unsigned mode;
 	int i;
 	int pathlen;
+	int never_interesting = -1;
 
 	if (!opt->nr_paths)
 		return 1;
@@ -89,9 +90,10 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
 
 	pathlen = tree_entry_len(path, sha1);
 
-	for (i=0; i < opt->nr_paths; i++) {
+	for (i = 0; i < opt->nr_paths; i++) {
 		const char *match = opt->paths[i];
 		int matchlen = opt->pathlens[i];
+		int m = -1; /* signals that we haven't called strncmp() */
 
 		if (baselen >= matchlen) {
 			/* If it doesn't match, move along... */
@@ -109,6 +111,37 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
 		match += baselen;
 		matchlen -= baselen;
 
+		if (never_interesting) {
+			/*
+			 * We have not seen any match that sorts later
+			 * than the current path.
+			 */
+
+			/*
+			 * Does match sort strictly earlier than path
+			 * with their common parts?
+			 */
+			m = strncmp(match, path,
+				    (matchlen < pathlen) ? matchlen : pathlen);
+			if (m < 0)
+				continue;
+
+			/*
+			 * If we come here even once, that means there is at
+			 * least one pathspec that would sort equal to or
+			 * later than the path we are currently looking at.
+			 * In other words, if we have never reached this point
+			 * after iterating all pathspecs, it means all
+			 * pathspecs are either outside of base, or inside the
+			 * base but sorts strictly earlier than the current
+			 * one.  In either case, they will never match the
+			 * subsequent entries.  In such a case, we initialized
+			 * the variable to -1 and that is what will be
+			 * returned, allowing the caller to terminate early.
+			 */
+			never_interesting = 0;
+		}
+
 		if (pathlen > matchlen)
 			continue;
 
@@ -119,12 +152,22 @@ static int tree_entry_interesting(struct tree_desc *desc, const char *base, int
 				continue;
 		}
 
-		if (strncmp(path, match, pathlen))
-			continue;
-
-		return 1;
+		if (m == -1)
+			/*
+			 * we cheated and did not do strncmp(), so we do
+			 * that here.
+			 */
+			m = strncmp(match, path, pathlen);
+
+		/*
+		 * If common part matched earlier then it is a hit,
+		 * because we rejected the case where path is not a
+		 * leading directory and is shorter than match.
+		 */
+		if (!m)
+			return 1;
 	}
-	return 0; /* No matches */
+	return never_interesting; /* No matches */
 }
 
 /* A whole sub-tree went away or appeared */

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]