Re: Object hash (was: Re: [ANNOUNCE] git-rev-size: calculate sizes of repository)

Johannes Schindelin <Johannes.Schindelin@xxxxxx> · Sun, 20 Aug 2006 18:51:52 +0200 (CEST)

Hi,

On Sun, 20 Aug 2006, Josef Weidendorfer wrote:

> On Sunday 20 August 2006 18:09, Johannes Schindelin wrote:
> 
> > +static unsigned int hash_index(struct hash_map *hash, const char *sha1)
> > +{
> > +	unsigned int index = *(unsigned int *)sha1;
> 
> If you have the same SHA1, stored at different addresses, you get different
> indexes for the same SHA1. Index probably should be calculated from the
> SHA1 string.

Actually, it does! "*(unsigned int *)sha1" means that the first 4 bytes 
of the sha1 are interpreted as a number.

> > +void hash_put(struct hash_map *hash, struct object *obj)
> > +{
> > +	if (++hash->nr > hash->alloc / 2)
> > +		grow_hash(hash);
> 
> If you insert the same object multiple times, hash->nr will get too big.

First, you cannot put the same object multiple times. That is not what a 
hash does (at least in this case): it stores unique objects (identified by 
their sha1 in this case). If you put another object with the same sha1, 
the first will be replaced.

Second, since you call hash_put() once per object, hash->nr cannot grow 
too big, because grow_hash() doubles hash->alloc. And I call grow_hash() 
once the hash map is half-full; Somebody once told me that would be the 
optimal growing strategy.

Ciao,
Dscho

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html