Johannes Schindelin <johannes.schindelin@xxxxxx> writes: > +void precompute_istate_hashes(struct cache_entry *ce) > +{ > + int namelen = ce_namelen(ce); > + > + while (namelen > 0 && !is_dir_sep(ce->name[namelen - 1])) > + namelen--; > + > + if (namelen <= 0) { > + ce->precomputed_hash.name = memihash(ce->name, ce_namelen(ce)); > + ce->precomputed_hash.root_entry = 1; > + } else { > + namelen--; > + ce->precomputed_hash.dir = memihash(ce->name, namelen); > + ce->precomputed_hash.name = memihash_continue( > + ce->precomputed_hash.dir, ce->name + namelen, > + ce_namelen(ce) - namelen); > + ce->precomputed_hash.root_entry = 0; > + } > + ce->precomputed_hash.initialized = 1; > +} > diff --git a/preload-index.c b/preload-index.c > index c1fe3a3ef9c..602737f9d0f 100644 > --- a/preload-index.c > +++ b/preload-index.c > @@ -47,6 +47,8 @@ static void *preload_thread(void *_data) > struct cache_entry *ce = *cep++; > struct stat st; > > + precompute_istate_hashes(ce); > + The fact that each preload_thread() still walks the index in-order makes me wonder if it may allow us to further optimize the "dir" part of the hash by passing the previous ce for which we already precomputed hash values. While the loop is iterating over the paths in the same directory, .dir component from the previous ce can be reused and .name component can "continue", no? It's possible that you already tried such an optimization and rejected it after finding that the cost of comparison of pathnames to tell if ce and previous ce are still in the same directory is more than unconditionally memihash() the directory part, and I am in no way saying that I found a missed optimization opportunity you must pursue. I am just being curious.