Re: Libgit2 on the Summer of Code

Ilari Liusvaara <ilari.liusvaara@xxxxxxxxxxx> · Thu, 27 May 2010 21:35:34 +0300

On Thu, May 27, 2010 at 11:05:54AM -0700, Shawn O. Pearce wrote:
> Ilari Liusvaara <ilari.liusvaara@xxxxxxxxxxx> wrote:
> > * Where algorithm in git_revpool_table__hash() is from? Since it appears to
> > hash binary object IDs, wouldn't just simple sum/xor over words be sufficient
> > (all SHA-1 output bits are very nearly independent). Or do you need to be
> > compatible with some other implementation (doesn't appear so, because hash
> > is computed differently depending on endianess)?
> 
> If you need a hash value for a SHA-1, why not just cast the unsigned
> char* to unsigned int* and load the first int as the hash code?
> The output of SHA-1 is pretty evenly distributed, using the first
> few bytes as an int should yield a sufficient distribution throughout
> the hashtable.

Yeah, With pseudorandom function[*], all ways of reducing the output to n bits are
at most as good as just taking first n bits. But if reducing output to [0, m),
the best way (distribution-wise, not speed-wise) is to take remainder of the whole
value divided by m...

[*] SHA-1 is not pseudorandom function, but for virtually all practical purposes
it is.

-Ilari
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html