Philip Herron wrote: > > Question still stands is the hashing function [in hash.c], which one and why? In the spirit of teaching you to fish... First you'll want to find out where the original users of this code were. So you run git blame -- hash.c and see that most of the lines come from 9027f53 (Do linear-time/space rename logic for exact renames, 2007-10-25). So you can then look at this commit: git show 9027f53c Aha, it says In the expectation that we will indeed do the same hashing trick for the general rename case, this code uses a generic hash-table implementation that can be used for other things too. In fact, we might be able to consolidate some of our existing hash tables with the new generic code in hash.[ch] and further down in the patch + hash = hash_filespec(filespec); + pos = insert_hash(hash, entry, table); and right above that +static unsigned int hash_filespec(struct diff_filespec *filespec) +{ + unsigned int hash; + if (!filespec->sha1_valid) { + if (diff_populate_filespec(filespec, 0)) + return 0; + hash_sha1_file(filespec->data, filespec->size, "blob", filespec- + } + memcpy(&hash, filespec->sha1, sizeof(hash)); + return hash; +} See? As for the *why*, presumably because all of git assumes two objects with the same SHA1 are indeed the same file; so we can later make the same optimisation again: + if (hashcmp(one->sha1, two->sha1)) + continue; And then, as we've already computed the SHA1, any subset of it is as good a hash as anything else; it'll be uniformly distributed. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html