Re: [PATCH 3/5] name-hash: precompute hash values during preload-index

Junio C Hamano <gitster@xxxxxxxxx> · Fri, 17 Feb 2017 21:47:32 -0800

Johannes Schindelin <johannes.schindelin@xxxxxx> writes:

> +void precompute_istate_hashes(struct cache_entry *ce)
> +{
> +	int namelen = ce_namelen(ce);
> +
> +	while (namelen > 0 && !is_dir_sep(ce->name[namelen - 1]))
> +		namelen--;
> +
> +	if (namelen <= 0) {
> +		ce->precomputed_hash.name = memihash(ce->name, ce_namelen(ce));
> +		ce->precomputed_hash.root_entry = 1;
> +	} else {
> +		namelen--;
> +		ce->precomputed_hash.dir = memihash(ce->name, namelen);
> +		ce->precomputed_hash.name = memihash_continue(
> +			ce->precomputed_hash.dir, ce->name + namelen,
> +			ce_namelen(ce) - namelen);
> +		ce->precomputed_hash.root_entry = 0;
> +	}
> +	ce->precomputed_hash.initialized = 1;
> +}
> diff --git a/preload-index.c b/preload-index.c
> index c1fe3a3ef9c..602737f9d0f 100644
> --- a/preload-index.c
> +++ b/preload-index.c
> @@ -47,6 +47,8 @@ static void *preload_thread(void *_data)
>  		struct cache_entry *ce = *cep++;
>  		struct stat st;
>  
> +		precompute_istate_hashes(ce);
> +

The fact that each preload_thread() still walks the index in-order
makes me wonder if it may allow us to further optimize the "dir"
part of the hash by passing the previous ce for which we already
precomputed hash values.  While the loop is iterating over the paths
in the same directory, .dir component from the previous ce can be
reused and .name component can "continue", no?

It's possible that you already tried such an optimization and
rejected it after finding that the cost of comparison of pathnames
to tell if ce and previous ce are still in the same directory is
more than unconditionally memihash() the directory part, and I am in
no way saying that I found a missed optimization opportunity you
must pursue.  I am just being curious.