On Mon, 13 Mar 2006, Junio C Hamano wrote: > > Here is still an WIP but an insanely fast one (actually this is > a modified version of what once was in next). I haven't > verified the sanity of its output fully, but from a cursory look > what are found look sensible. The same v2.6.12..v2.6.14 test on > my Duron 750: Heh. I did something similar, except I wanted mine to work with binary data too. Not that I know how _well_ it works, but assuming you have _some_ '\n' characters to fix up offset mismatches, it might do something. Mine is a bit less hacky than yours, I believe. It doesn't skip whitespace, instead it just maintains a rolling 64-bit number, where each character shifts it left by 7 and then adds in the new character value (overflow in 32 bits just ignored). Then it uses your old hash function, except it hides the length in the top byte. It breaks the hashing on '\n' or on hitting a 64-byte sequence, whichever comes first. It's fast and stupid, but doesn't seem to do any worse than your old one. The speed comes from the fact that it only does the hash comparisons at the "block boundaries", not at every byte. Anyway, I don't think something like this is really any good for rename detection, but it might be good for deciding whether to do a real delta. Linus ---- diff --git a/diffcore-delta.c b/diffcore-delta.c index 835d82c..4c6e512 100644 --- a/diffcore-delta.c +++ b/diffcore-delta.c @@ -127,7 +127,7 @@ static struct spanhash_top *add_spanhash static struct spanhash_top *hash_chars(unsigned char *buf, unsigned long sz) { - int i; + int i, n; unsigned long accum1, accum2, hashval; struct spanhash_top *hash; @@ -137,19 +137,21 @@ static struct spanhash_top *hash_chars(u hash->free = INITIAL_FREE(i); memset(hash->data, 0, sizeof(struct spanhash) * (1<<i)); - /* an 8-byte shift register made of accum1 and accum2. New - * bytes come at LSB of accum2, and shifted up to accum1 - */ - for (i = accum1 = accum2 = 0; i < 7; i++, sz--) { - accum1 = (accum1 << 8) | (accum2 >> 24); - accum2 = (accum2 << 8) | *buf++; - } + n = 0; + accum1 = accum2 = 0; while (sz) { - accum1 = (accum1 << 8) | (accum2 >> 24); - accum2 = (accum2 << 8) | *buf++; + unsigned long c = *buf++; + sz--; + accum1 = (accum1 << 7) | (accum2 >> 25); + accum2 = (accum2 << 7) | (accum1 >> 25); + accum1 += c; + if (++n < 64 && c != '\n') + continue; hashval = (accum1 + accum2 * 0x61) % HASHBASE; + hashval |= (n << 24); hash = add_spanhash(hash, hashval); - sz--; + n = 0; + accum1 = accum2 = 0; } return hash; } @@ -166,9 +168,6 @@ int diffcore_count_changes(void *src, un struct spanhash_top *src_count, *dst_count; unsigned long sc, la; - if (src_size < 8 || dst_size < 8) - return -1; - src_count = dst_count = NULL; if (src_count_p) src_count = *src_count_p; @@ -196,6 +195,8 @@ int diffcore_count_changes(void *src, un src_cnt = s->cnt; d = spanhash_find(dst_count, s->hashval); dst_cnt = d ? d->cnt : 0; + dst_cnt *= (d->hashval >> 24); + src_cnt *= (d->hashval >> 24); if (src_cnt < dst_cnt) { la += dst_cnt - src_cnt; sc += src_cnt; - : send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html