Re: git-diff-tree -M performance regression in 'next'

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On Mon, 13 Mar 2006, Junio C Hamano wrote:
> 
> Here is still an WIP but an insanely fast one (actually this is
> a modified version of what once was in next).  I haven't
> verified the sanity of its output fully, but from a cursory look
> what are found look sensible.  The same v2.6.12..v2.6.14 test on
> my Duron 750:

Heh. I did something similar, except I wanted mine to work with binary 
data too. Not that I know how _well_ it works, but assuming you have 
_some_ '\n' characters to fix up offset mismatches, it might do something.

Mine is a bit less hacky than yours, I believe. It doesn't skip 
whitespace, instead it just maintains a rolling 64-bit number, where each 
character shifts it left by 7 and then adds in the new character value 
(overflow in 32 bits just ignored).

Then it uses your old hash function, except it hides the length in the top 
byte.

It breaks the hashing on '\n' or on hitting a 64-byte sequence, whichever 
comes first.

It's fast and stupid, but doesn't seem to do any worse than your old one. 
The speed comes from the fact that it only does the hash comparisons at 
the "block boundaries", not at every byte.

Anyway, I don't think something like this is really any good for rename 
detection, but it might be good for deciding whether to do a real delta.

		Linus

----
diff --git a/diffcore-delta.c b/diffcore-delta.c
index 835d82c..4c6e512 100644
--- a/diffcore-delta.c
+++ b/diffcore-delta.c
@@ -127,7 +127,7 @@ static struct spanhash_top *add_spanhash
 
 static struct spanhash_top *hash_chars(unsigned char *buf, unsigned long sz)
 {
-	int i;
+	int i, n;
 	unsigned long accum1, accum2, hashval;
 	struct spanhash_top *hash;
 
@@ -137,19 +137,21 @@ static struct spanhash_top *hash_chars(u
 	hash->free = INITIAL_FREE(i);
 	memset(hash->data, 0, sizeof(struct spanhash) * (1<<i));
 
-	/* an 8-byte shift register made of accum1 and accum2.  New
-	 * bytes come at LSB of accum2, and shifted up to accum1
-	 */
-	for (i = accum1 = accum2 = 0; i < 7; i++, sz--) {
-		accum1 = (accum1 << 8) | (accum2 >> 24);
-		accum2 = (accum2 << 8) | *buf++;
-	}
+	n = 0;
+	accum1 = accum2 = 0;
 	while (sz) {
-		accum1 = (accum1 << 8) | (accum2 >> 24);
-		accum2 = (accum2 << 8) | *buf++;
+		unsigned long c = *buf++;
+		sz--;
+		accum1 = (accum1 << 7) | (accum2 >> 25);
+		accum2 = (accum2 << 7) | (accum1 >> 25);
+		accum1 += c;
+		if (++n < 64 && c != '\n')
+			continue;
 		hashval = (accum1 + accum2 * 0x61) % HASHBASE;
+		hashval |= (n << 24);
 		hash = add_spanhash(hash, hashval);
-		sz--;
+		n = 0;
+		accum1 = accum2 = 0;
 	}
 	return hash;
 }
@@ -166,9 +168,6 @@ int diffcore_count_changes(void *src, un
 	struct spanhash_top *src_count, *dst_count;
 	unsigned long sc, la;
 
-	if (src_size < 8 || dst_size < 8)
-		return -1;
-
 	src_count = dst_count = NULL;
 	if (src_count_p)
 		src_count = *src_count_p;
@@ -196,6 +195,8 @@ int diffcore_count_changes(void *src, un
 		src_cnt = s->cnt;
 		d = spanhash_find(dst_count, s->hashval);
 		dst_cnt = d ? d->cnt : 0;
+		dst_cnt *= (d->hashval >> 24);
+		src_cnt *= (d->hashval >> 24);
 		if (src_cnt < dst_cnt) {
 			la += dst_cnt - src_cnt;
 			sc += src_cnt;
-
: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Index of Archives]     [Linux Kernel Development]     [Gcc Help]     [IETF Annouce]     [DCCP]     [Netdev]     [Networking]     [Security]     [V4L]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux SCSI]     [Fedora Users]