Re: [PATCH v2 4/4] describe: Delay looking up commits until searching for an inexact match

Junio C Hamano <gitster@xxxxxxxxx> · Wed, 08 Dec 2010 14:50:37 -0800

Anders Kaseorg <andersk@xxxxxxxxxxx> writes:

> Now that struct commit.util is not used until after weâve checked that
> the argument doesnât exactly match a tag, we can wait until then to
> look up the commits for each tag.
>
> This avoids a lot of I/O on --exact-match queries in repositories with
> many tags.  For example, âgit describe --exact-match HEADâ becomes
> about 12 times faster on a cold cache (3.2s instead of 39s) in a
> linux-2.6 repository with 2000 packed tags.  Thatâs a huge win for the
> interactivity of the __git_ps1 shell prompt helper when on a detached
> HEAD.

You seem to have gone in a slightly different direction with this reroll.
I am not sure if use of hash_table in this code would actually improve
anything (aside from the general readability and "reusing code from a good
infrastructure is a good thing" principle), though, no matter how many
tags you have in your repository.  In the code from the earlier round,
lookup_commit_reference_gently call in the fallback codepath to populate
commit->util used the *obj_hash[] to quickly look up the commits with the
same object name already.

> @@ -60,6 +61,15 @@ static inline struct commit_name *find_commit_name(const unsigned char *peeled)
>  	return n;
>  }
>  
> +static int set_util(void *vn)
> +{
> +	struct commit_name *n = vn;
> +	struct commit *c = lookup_commit_reference_gently(n->peeled, 1);
> +	if (c)
> +		c->util = n;
> +	return 0;
> +}

I don't know how the above would work in the face of hash collisions.

This is a callback for for_each_hash(), which visits each populated entry
in the underlying hash table.  The semantics of the hash table hash.c
gives you is "here is a flat hash table.  We insert only to an empty
field, and otherwise we will tell you where your new element would have
hashed into, so that you can implement linear overflow or chained buckets
yourself on top of this", and you chose to implement bucket chaining in
add_to_known_names(), so don't you need to walk the chain here for commits
that share the same first four bytes in their object names?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html